Topological data analysis

拓扑数据分析
  • 文章类型: Journal Article
    本文的动机是需要稳定医学图像分析的深度学习(DL)训练对卷积滤波器相对于模型过拟合和鲁棒性的调节的影响。我们提出了一种简单的策略来减少正方形矩阵条件数,并研究其对良好和病态矩阵的点云空间分布的影响。对于方阵,SVD手术策略的工作原理是:(1)计算其奇异值分解(SVD),(2)相对于最大的奇异值改变一些较小的奇异值,(3)通过逆SVD重建矩阵。在训练期间在CNN卷积滤波器上应用SVD手术充当DL模型的谱正则化,而不需要学习额外的参数。矩阵离非可逆矩阵越远,条件数越高,表明方阵及其逆的空间分布与其条件数分布相关。我们将通过证明在矩阵的点云上应用各种版本的SVD手术可以使其持久图(PD)更接近其逆点云的矩阵来实证检验这一断言。
    This paper is motivated by the need to stabilise the impact of deep learning (DL) training for medical image analysis on the conditioning of convolution filters in relation to model overfitting and robustness. We present a simple strategy to reduce square matrix condition numbers and investigate its effect on the spatial distributions of point clouds of well- and ill-conditioned matrices. For a square matrix, the SVD surgery strategy works by: (1) computing its singular value decomposition (SVD), (2) changing a few of the smaller singular values relative to the largest one, and (3) reconstructing the matrix by reverse SVD. Applying SVD surgery on CNN convolution filters during training acts as spectral regularisation of the DL model without requiring the learning of extra parameters. The fact that the further away a matrix is from the non-invertible matrices, the higher its condition number is suggests that the spatial distributions of square matrices and those of their inverses are correlated to their condition number distributions. We shall examine this assertion empirically by showing that applying various versions of SVD surgery on point clouds of matrices leads to bringing their persistent diagrams (PDs) closer to the matrices of the point clouds of their inverses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    拓扑数据分析中使用的方法自然地捕获嵌入度量空间中的点云数据中的高阶交互。这种方法最近被扩展到信息空间中的数据,我们指的是用信息理论距离测量的空间。一种这样的设置是嵌入在用相对熵(Kullback-Leibler散度)测量的概率单纯形中的离散概率分布的有限集合。更一般地说,可以使用由不同的熵概念参数化的Bregman散度。虽然这种设置存在理论算法,探索和比较各种信息空间的几何拓扑属性的实现很少。因此,这项工作的兴趣是双重的。首先,我们提出了第一个用于信息空间中几何和拓扑数据分析的鲁棒算法和软件。也许令人惊讶的是,尽管与BregmanDivergences合作,我们的设计在欧几里得案例中重用了健壮的库。第二,使用新软件,我们迈出了理解这些空间的几何拓扑结构的第一步。特别是,我们将它们与配备欧几里得和费希尔度量的更熟悉的空间进行比较。
    Methods used in topological data analysis naturally capture higher-order interactions in point cloud data embedded in a metric space. This methodology was recently extended to data living in an information space, by which we mean a space measured with an information theoretical distance. One such setting is a finite collection of discrete probability distributions embedded in the probability simplex measured with the relative entropy (Kullback-Leibler divergence). More generally, one can work with a Bregman divergence parameterized by a different notion of entropy. While theoretical algorithms exist for this setup, there is a paucity of implementations for exploring and comparing geometric-topological properties of various information spaces. The interest of this work is therefore twofold. First, we propose the first robust algorithms and software for geometric and topological data analysis in information space. Perhaps surprisingly, despite working with Bregman divergences, our design reuses robust libraries for the Euclidean case. Second, using the new software, we take the first steps towards understanding the geometric-topological structure of these spaces. In particular, we compare them with the more familiar spaces equipped with the Euclidean and Fisher metrics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:精神分裂症患者通常表现出思想混乱的症状,并在口头报告中表现出具体性和过度包容性。取决于抽象级别。虽然具体性和过度包容可能看起来是矛盾的,解释这些症状的潜在精神病理学仍不清楚.在目前的研究中,我们使用功能磁共振成像和编码建模方法来检查各种词的概念,表现为大脑活动,在精神分裂症患者的不同抽象水平上有异常的联系。
    方法:14名被诊断为精神分裂症的个体和17名健康对照者接受了功能磁共振成像,以测量代表各种单词概念的大脑活动。我们使用持续同源性(PH)方法分析了精神分裂症患者单词表示的拓扑结构,健康的控制,和随机数据,通过改变表示空间中的不同尺度来跨越不同的抽象级别。
    结果:结果显示,与健康对照组相比,精神分裂症患者在不同抽象水平上表现出更均匀的单词关系。此外,与对照组相比,精神分裂症患者的拓扑结构向随机网络结构转变。PH方法成功地将精神分裂症患者的语义表示与对照组区分开。
    结论:目前的结果为精神分裂症中观察到的抽象能力缺陷的潜在机制提供了解释。单个概念的同位素联系既反映了语义上细粒度尺度上上下文联系的减少,又反映了粗尺度上相关概念之间没有明确的界限,导致具体性和过度包容,分别。
    OBJECTIVE: Patients with schizophrenia typically exhibit symptoms of disorganized thought and display concreteness and over-inclusion in verbal reports, depending on the level of abstraction. While concreteness and over-inclusion may appear contradictory, the underlying psychopathology that explains these symptoms remains unclear. In the current study, we used functional magnetic resonance imaging with an encoding modeling approach to examine how concepts of various words, represented as brain activity, are anomalously connected at different levels of abstraction in patients with schizophrenia.
    METHODS: Fourteen individuals diagnosed with schizophrenia and 17 healthy controls underwent functional magnetic resonance imaging to measure brain activity representing concepts of various words. We used a persistent homology (PH) method to analyze the topological structures of word representations in schizophrenia patients, healthy controls, and random data, across different levels of abstraction by varying dissimilarity scales in the representation space.
    RESULTS: The results revealed that patients with schizophrenia exhibited more homogeneous word relationships across different levels of abstraction compared with healthy controls. Additionally, topological structures exhibited a shift toward a random network structure in patients with schizophrenia compared with controls. The PH method successfully distinguished semantic representations of patients with schizophrenia from those of controls.
    CONCLUSIONS: The current results provide an explanation for the mechanisms underlying the deficits in abstraction ability observed in schizophrenia. The isotopic connection of individual concepts reflects both the reduction of contextual connections at a semantically fine-grained scale and the absence of clear boundaries between related concepts at a coarse scale, which lead to concreteness and over-inclusion, respectively.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    患者复杂且异质;临床数据集因噪声而变得复杂,缺少数据,以及混合类型数据的存在。使用这样的数据集需要理解高维的“患者空间”,由定义所有相关表型的所有测量组成。当前的最新技术仅使用聚类分析来定义患者的空间分组。我们的目标是应用拓扑数据分析(TDA),一种新的无监督技术,对患者空间有更全面的了解。我们将TDA应用于266位先前未经治疗的慢性淋巴细胞白血病(CLL)患者,使用“daisy”度量来计算临床记录之间的距离。我们在CLL数据中发现了回路和空隙的明确证据。为了解释这些结构,我们开发了新颖的计算和图形方法。最持久的循环和最持久的空隙可以用三个二分法来解释,CLL中的预后重要因素:IGHV体细胞突变状态,β-2微球蛋白,和Rai舞台。总之,患者空间比当前模型建议的更丰富,更复杂。通过提供对生物过程的新颖见解并提高我们对临床和生物数据集的理解,TDA可以成为研究人员解释高维数据的强大工具。
    Patients are complex and heterogeneous; clinical data sets are complicated by noise, missing data, and the presence of mixed-type data. Using such data sets requires understanding the high-dimensional \"space of patients\", composed of all measurements that define all relevant phenotypes. The current state-of-the-art merely defines spatial groupings of patients using cluster analyses. Our goal is to apply topological data analysis (TDA), a new unsupervised technique, to obtain a more complete understanding of patient space. We applied TDA to a space of 266 previously untreated patients with Chronic Lymphocytic Leukemia (CLL), using the \"daisy\" metric to compute distances between clinical records. We found clear evidence for both loops and voids in the CLL data. To interpret these structures, we developed novel computational and graphical methods. The most persistent loop and the most persistent void can be explained using three dichotomized, prognostically important factors in CLL: IGHV somatic mutation status, beta-2 microglobulin, and Rai stage. In conclusion, patient space turns out to be richer and more complex than current models suggest. TDA could become a powerful tool in a researcher\'s arsenal for interpreting high-dimensional data by providing novel insights into biological processes and improving our understanding of clinical and biological data sets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    空间蛋白质组学和蛋白质共定位的进展是理解细胞机制及其对生物过程影响的驱动力。空间蛋白质组学领域的新方法呼唤算法的发展,开辟新的研究途径。新引入的分子Pixelation(MPX)提供了有关单细胞中表面蛋白及其相互关系的空间信息。这允许将膜蛋白的邻域计算机表示为图形。为了分析这种新的数据模式,我们调整了MPX单细胞图网络中的局部分类性,并创建了一种能够捕获有关蛋白质空间关系的详细信息的方法。介绍的方法可以评估蛋白质的成对共定位,并获得高阶相似性,以同时研究多种蛋白质的共定位。我们使用公开的MPX数据集评估了该方法,其中用趋化因子处理T细胞以研究尿足动物的形成。我们证明了调整后的局部差异性在单标记和多标记水平上都能检测到刺激的影响,这增强了我们对尾足动物形成的理解。我们还使用治疗性抗体将我们的方法应用于治疗癌性B细胞系。有了调整后的本地多样性,我们概述了利妥昔单抗对CD20极性的影响.我们的计算方法与MPX一起提高了我们不仅对刺激下细胞极性形成和蛋白质共定位的理解,而且还提高了对免疫反应和细胞表面蛋白重组的整体认识。这反过来又允许设计新疗法。当表示为无向图时,我们预计其适用于其他类型的生物空间数据。
    Advances in spatial proteomics and protein colocalization are a driving force in the understanding of cellular mechanisms and their influence on biological processes. New methods in the field of spatial proteomics call for the development of algorithms and open up new avenues of research. The newly introduced Molecular Pixelation (MPX) provides spatial information on surface proteins and their relationship with each other in single cells. This allows for in silico representation of neighborhoods of membrane proteins as graphs. In order to analyze this new data modality, we adapted local assortativity in networks of MPX single-cell graphs and created a method that is able to capture detailed information on the spatial relationships of proteins. The introduced method can evaluate the pairwise colocalization of proteins and access higher-order similarity to investigate the colocalization of multiple proteins at the same time. We evaluated the method using publicly available MPX datasets where T cells were treated with a chemokine to study uropod formation. We demonstrate that adjusted local assortativity detects the effects of the stimuli at both single- and multiple-marker levels, which enhances our understanding of the uropod formation. We also applied our method to treating cancerous B-cell lines using a therapeutic antibody. With the adjusted local assortativity, we recapitulated the effect of rituximab on the polarity of CD20. Our computational method together with MPX improves our understanding of not only the formation of cell polarity and protein colocalization under stimuli but also advancing the overall insight into immune reaction and reorganization of cell surface proteins, which in turn allows the design of novel therapies. We foresee its applicability to other types of biological spatial data when represented as undirected graphs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    具有拓扑数据分析(TDA)生成的持久性特征的可穿戴传感器数据分析在各种应用中取得了巨大的成功,然而,它遭受大量的计算和时间资源来提取拓扑特征。在本文中,我们的方法利用知识蒸馏(KD),涉及使用由TDA生成的原始时间序列和持久性图像训练的多个教师网络,分别。然而,从教师模型中直接转移知识,利用不同的特征作为学生模型的输入,会导致知识差距和有限的表现。为了解决这个问题,我们引入了一个强大的框架,该框架集成了来自两个不同教师的多模式功能,使学生能够有效地学习所需的知识。为了解释多模态的统计差异,利用基于熵的约束自适应加权机制来自动平衡教师的影响,并鼓励学生模型充分采用两位教师的知识。为了吸收不同风格模型产生的不同结构信息进行蒸馏,使用小批量中的批量和通道相似性。我们证明了该方法在可穿戴传感器数据上的有效性。
    Wearable sensor data analysis with persistence features generated by topological data analysis (TDA) has achieved great successes in various applications, however, it suffers from large computational and time resources for extracting topological features. In this paper, our approach utilizes knowledge distillation (KD) that involves the use of multiple teacher networks trained with the raw time-series and persistence images generated by TDA, respectively. However, direct transfer of knowledge from the teacher models utilizing different characteristics as inputs to the student model results in a knowledge gap and limited performance. To address this problem, we introduce a robust framework that integrates multimodal features from two different teachers and enables a student to learn desirable knowledge effectively. To account for statistical differences in multimodalities, entropy based constrained adaptive weighting mechanism is leveraged to automatically balance the effects of teachers and encourage the student model to adequately adopt the knowledge from two teachers. To assimilate dissimilar structural information generated by different style models for distillation, batch and channel similarities within a mini-batch are used. We demonstrate the effectiveness of the proposed method on wearable sensor data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    BACKGROUND: Human placenta hydrolysates (HPH), the study of which was initiated by the scientific school of Vladimir P. Filatov, are currently being investigated using modern proteomic technologies. HPH is a promising tool for maintaining the function of mitochondria and regenerating tissues and organs with a high content of mitochondria (liver, heart muscle, skeletal muscles, etc.). The molecular mechanisms of action of HPH are practically not studied.
    OBJECTIVE: Identification of mitochondrial support mitochondrial function-supporting peptides in HPH (Laennec, produced by Japan Bioproducts).
    METHODS: Data on the chemical structure of the peptides were collected through a mass spectrometric experiment. Then, to establish the amino acid sequences of the peptides, de novo peptide sequencing algorithms based on the mathematical theory of topological and metric analysis of chemographs were applied. Bioinformatic analysis of the peptide composition of HPH was carried out using the integral protein annotation method.
    RESULTS: The biological functions of 41 peptides in the composition of HPH have been identified and described. Among the target proteins, the activity of which is regulated by the identified peptides and significantly affects the function of mitochondria, are caspases (CASP1, CASP3, CASP4) and other proteins regulating apoptosis (BCL2, CANPL1, PPARA), MAP kinases (MAPK1, MAPK3, MAPK4, MAPK8, MAPK9 , MAPK10, MAPK14), AKT1/GSK3B/MTOR cascade kinases, and a number of other target proteins (ADGRG6 receptor, inhibitor of NF-êB kinase IKKE, pyruvate dehydrogenase 2/3/4, SIRT1 sirtuin deacetylase, ULK1 kinase).
    CONCLUSIONS: HPH peptides have been identified that promote inhibition of mitochondrial pore formation, apoptosis, and excessive mitochondrial autophagy under conditions of oxidative/toxic stress, chronic inflammation, and/or hyperinsulinemia.
    Актуальность. Гидролизаты плаценты человека (ГПЧ), начало изучения которых было положено научной школой В.П. Филатова, в настоящее время исследуются посредством современных протеомных технологий. ГПЧ представляют собой перспективное средство для поддержания функции митохондрий и регенерации тканей и органов с высоким содержанием митохондрий (печени, сердечной мышцы, скелетной мускулатуры и др.). Молекулярные механизмы действия ГПЧ практически не изучены. Цель. Идентификация в составе ГПЧ (Лаеннек, Japan Bioproducts) пептидов, поддерживающих функционирование митохондрий. Материалы и методы. Данные о химической структуре пептидов собирали посредством масс-спектрометрического эксперимента. Затем для установления аминокислотных последовательностей пептидов применены алгоритмы de novo секвенирования пептидов, основанные на математической теории топологического и метрического анализа хемографов. Биоинформационный анализ пептидного состава ГПЧ осуществлен посредством интегрального метода аннотации белков. Результаты. Идентифицированы и описаны биологические функции 41 пептида в составе ГПЧ. Среди таргетных белков, активность которых регулируется выявленными пептидами и существенно влияет на функцию митохондрий, представлены каспазы (CASP1, CASP3, CASP4) и другие белки регуляции апоптоза (BCL2, CANPL1, PPARA), митоген-активируемые протеинкиназы (MAPK1, MAPK3, MAPK4, MAPK8, MAPK9, MAPK10, MAPK14), киназы каскада AKT1/GSK3B/MTOR и ряд других таргетных белков (рецептор ADGRG6, ингибитор киназы IKKE ядерного фактора каппа-би (NF-êB), пируватдегидрогеназы 2/3/4, НАД-зависимая деацетилаза сиртуин SIRT1, киназа ULK1). Заключение. Установлены пептиды ГПЧ, способствующие торможению формирования митохондриальной поры, апоптоза и избыточной аутофагии митохондрий в условиях оксидативного/токсического стресса, хронического воспаления и/или гиперинсулинемии.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    持续同源性(PH)是一种拓扑数据分析(TDA)方法,可计算对噪声具有鲁棒性的高维数据的多尺度拓扑不变属性。虽然PH已经揭示了各种应用的有用模式,计算要求将应用限制在几千点的小数据集。我们介绍Dory,一种高效且可扩展的算法,可以在较大的数据集上计算稀疏Vietoris-Rips复合物的持久同源性,直到并包括维度2以及在字段Z2上。作为一个应用程序,我们以高分辨率计算人类基因组的PH,这是由包含大约300万个点的全基因组Hi-C数据集揭示的。现有算法无法处理它,而多莉在五分钟内处理了它,使用小于5GB的内存。结果表明,用生长素处理后,人类基因组的拓扑结构发生了显着变化,降解粘附素的分子,证实了粘附素在DNA环形成中起关键作用的假设。
    Persistent homology (PH) is an approach to topological data analysis (TDA) that computes multi-scale topologically invariant properties of high-dimensional data that are robust to noise. While PH has revealed useful patterns across various applications, computational requirements have limited applications to small data sets of a few thousand points. We present Dory, an efficient and scalable algorithm that can compute the persistent homology of sparse Vietoris-Rips complexes on larger data sets, up to and including dimension two and over the field Z2. As an application, we compute the PH of the human genome at high resolution as revealed by a genome-wide Hi-C data set containing approximately three million points. Extant algorithms were unable to process it, whereas Dory processed it within five minutes, using less than five GB of memory. Results show that the topology of the human genome changes significantly upon treatment with auxin, a molecule that degrades cohesin, corroborating the hypothesis that cohesin plays a crucial role in loop formation in DNA.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    儿童虐待可能会对大脑发育产生不利影响,从而影响行为,情感,和成年期的心理模式。在这项研究中,我们提出了一个分析管道,用于对虐待儿童和典型发育儿童脑白质拓扑结构的改变进行建模。我们进行拓扑数据分析(TDA),以评估儿童脑白质结构协方差网络的全局拓扑结构的变化。我们使用持久同源性,TDA中的一种代数技术,分析由结构磁共振成像和弥散张量成像构建的脑协方差网络的拓扑特征。我们开发了一种基于Wasserstein距离的统计推断新框架,以评估观察到的拓扑差异的重要性。使用这些方法将虐待儿童与典型的发育对照组进行比较,我们发现,虐待可能会增加白质结构的同质性,从而引起结构协方差的更高相关性;这反映在拓扑轮廓中。我们的发现强烈表明,TDA可以成为模拟大脑拓扑结构改变的有价值的框架。本研究中使用的MATLAB代码和处理后的数据可以在https://github.com/laplcebeltrami/maltreated上找到。
    我们使用拓扑数据分析(TDA)来研究遭受虐待的儿童白质中拓扑结构的改变。TDA中的持续同源性用于量化通常发育中的儿童与遭受虐待的儿童之间的拓扑差异,使用磁共振成像和扩散张量成像数据。计算拓扑特征之间的Wasserstein距离,以评估大脑网络中的差异。我们的发现表明,持续的同源性有效地表征了遭受虐待的儿童白质动力学的改变。
    Childhood maltreatment may adversely affect brain development and consequently influence behavioral, emotional, and psychological patterns during adulthood. In this study, we propose an analytical pipeline for modeling the altered topological structure of brain white matter in maltreated and typically developing children. We perform topological data analysis (TDA) to assess the alteration in the global topology of the brain white matter structural covariance network among children. We use persistent homology, an algebraic technique in TDA, to analyze topological features in the brain covariance networks constructed from structural magnetic resonance imaging and diffusion tensor imaging. We develop a novel framework for statistical inference based on the Wasserstein distance to assess the significance of the observed topological differences. Using these methods in comparing maltreated children with a typically developing control group, we find that maltreatment may increase homogeneity in white matter structures and thus induce higher correlations in the structural covariance; this is reflected in the topological profile. Our findings strongly suggest that TDA can be a valuable framework to model altered topological structures of the brain. The MATLAB codes and processed data used in this study can be found at https://github.com/laplcebeltrami/maltreated.
    We employ topological data analysis (TDA) to investigate altered topological structures in the white matter of children who have experienced maltreatment. Persistent homology in TDA is utilized to quantify topological differences between typically developing children and those subjected to maltreatment, using magnetic resonance imaging and diffusion tensor imaging data. The Wasserstein distance is computed between topological features to assess disparities in brain networks. Our findings demonstrate that persistent homology effectively characterizes the altered dynamics of white matter in children who have suffered maltreatment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    蛋白质随时间发生的变化提供了系统发育信号,可用于破译其进化史和生物体之间的关系。序列比较是获取这种系统发育信号的最常见方法,而那些基于3D结构的比较仍处于起步阶段。在这项研究中,我们提出了一种基于持续同源理论(PH)的有效方法来提取蛋白质结构中包含的系统发育信息。PH提供了有效且稳健的算法,用于从不同空间分辨率的嘈杂数据集中提取和比较几何特征。PH在生命科学中的应用越来越多,包括蛋白质的研究(例如分类,折叠)。然而,它从未被用来研究它们可能包含的系统发育信号。这里,使用518个蛋白质家族,代表22,940个蛋白质序列和结构,来自10个主要分类群体,我们表明,从蛋白质结构与PH计算的距离与从蛋白质序列计算的系统发育距离密切相关,在小型和大型进化尺度上。我们测试了几种计算PH距离的方法,并提出了一些改进方法,以提高它们与解决进化问题的相关性。这项工作通过提出一种访问蛋白质结构中包含的系统发育信号的有效方法,为进化生物学开辟了新的视角,以及生命科学中拓扑分析的未来发展。
    Changes that occur in proteins over time provide a phylogenetic signal that can be used to decipher their evolutionary history and the relationships between organisms. Sequence comparison is the most common way to access this phylogenetic signal, while those based on 3D structure comparisons are still in their infancy. In this study, we propose an effective approach based on Persistent Homology Theory (PH) to extract the phylogenetic information contained in protein structures. PH provides efficient and robust algorithms for extracting and comparing geometric features from noisy datasets at different spatial resolutions. PH has a growing number of applications in the life sciences, including the study of proteins (e.g. classification, folding). However, it has never been used to study the phylogenetic signal they may contain. Here, using 518 protein families, representing 22,940 protein sequences and structures, from 10 major taxonomic groups, we show that distances calculated with PH from protein structures correlate strongly with phylogenetic distances calculated from protein sequences, at both small and large evolutionary scales. We test several methods for calculating PH distances and propose some refinements to improve their relevance for addressing evolutionary questions. This work opens up new perspectives in evolutionary biology by proposing an efficient way to access the phylogenetic signal contained in protein structures, as well as future developments of topological analysis in the life sciences.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号