topological data analysis

拓扑数据分析
  • 文章类型: Journal Article
    患者复杂且异质;临床数据集因噪声而变得复杂,缺少数据,以及混合类型数据的存在。使用这样的数据集需要理解高维的“患者空间”,由定义所有相关表型的所有测量组成。当前的最新技术仅使用聚类分析来定义患者的空间分组。我们的目标是应用拓扑数据分析(TDA),一种新的无监督技术,对患者空间有更全面的了解。我们将TDA应用于266位先前未经治疗的慢性淋巴细胞白血病(CLL)患者,使用“daisy”度量来计算临床记录之间的距离。我们在CLL数据中发现了回路和空隙的明确证据。为了解释这些结构,我们开发了新颖的计算和图形方法。最持久的循环和最持久的空隙可以用三个二分法来解释,CLL中的预后重要因素:IGHV体细胞突变状态,β-2微球蛋白,和Rai舞台。总之,患者空间比当前模型建议的更丰富,更复杂。通过提供对生物过程的新颖见解并提高我们对临床和生物数据集的理解,TDA可以成为研究人员解释高维数据的强大工具。
    Patients are complex and heterogeneous; clinical data sets are complicated by noise, missing data, and the presence of mixed-type data. Using such data sets requires understanding the high-dimensional \"space of patients\", composed of all measurements that define all relevant phenotypes. The current state-of-the-art merely defines spatial groupings of patients using cluster analyses. Our goal is to apply topological data analysis (TDA), a new unsupervised technique, to obtain a more complete understanding of patient space. We applied TDA to a space of 266 previously untreated patients with Chronic Lymphocytic Leukemia (CLL), using the \"daisy\" metric to compute distances between clinical records. We found clear evidence for both loops and voids in the CLL data. To interpret these structures, we developed novel computational and graphical methods. The most persistent loop and the most persistent void can be explained using three dichotomized, prognostically important factors in CLL: IGHV somatic mutation status, beta-2 microglobulin, and Rai stage. In conclusion, patient space turns out to be richer and more complex than current models suggest. TDA could become a powerful tool in a researcher\'s arsenal for interpreting high-dimensional data by providing novel insights into biological processes and improving our understanding of clinical and biological data sets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Editorial
    这篇社论探讨了图形过滤学习(GFL)在革命性的肝细胞癌(HCC)成像分析中的新兴作用。随着传统的基于像素的方法达到极限,GFL提供了一种新颖的方法来捕获医学图像中的复杂拓扑特征。通过将成像数据表示为图形并利用持续的同源性,GFL揭示了以前无法访问的新信息维度。这种范式转变有望增强HCC的诊断,治疗计划,和预测。我们讨论了GFL的原则,其在HCC成像中的潜在应用,以及将这种创新技术转化为临床实践的挑战。
    This editorial explores the emerging role of Graph Filtration Learning (GFL) in revolutionizing Hepatocellular carcinoma (HCC) imaging analysis. As traditional pixel-based methods reach their limits, GFL offers a novel approach to capture complex topological features in medical images. By representing imaging data as graphs and leveraging persistent homology, GFL unveils new dimensions of information that were previously inaccessible. This paradigm shift holds promise for enhancing HCC diagnosis, treatment planning, and prognostication. We discuss the principles of GFL, its potential applications in HCC imaging, and the challenges in translating this innovative technique into clinical practice.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    空间蛋白质组学和蛋白质共定位的进展是理解细胞机制及其对生物过程影响的驱动力。空间蛋白质组学领域的新方法呼唤算法的发展,开辟新的研究途径。新引入的分子Pixelation(MPX)提供了有关单细胞中表面蛋白及其相互关系的空间信息。这允许将膜蛋白的邻域计算机表示为图形。为了分析这种新的数据模式,我们调整了MPX单细胞图网络中的局部分类性,并创建了一种能够捕获有关蛋白质空间关系的详细信息的方法。介绍的方法可以评估蛋白质的成对共定位,并获得高阶相似性,以同时研究多种蛋白质的共定位。我们使用公开的MPX数据集评估了该方法,其中用趋化因子处理T细胞以研究尿足动物的形成。我们证明了调整后的局部差异性在单标记和多标记水平上都能检测到刺激的影响,这增强了我们对尾足动物形成的理解。我们还使用治疗性抗体将我们的方法应用于治疗癌性B细胞系。有了调整后的本地多样性,我们概述了利妥昔单抗对CD20极性的影响.我们的计算方法与MPX一起提高了我们不仅对刺激下细胞极性形成和蛋白质共定位的理解,而且还提高了对免疫反应和细胞表面蛋白重组的整体认识。这反过来又允许设计新疗法。当表示为无向图时,我们预计其适用于其他类型的生物空间数据。
    Advances in spatial proteomics and protein colocalization are a driving force in the understanding of cellular mechanisms and their influence on biological processes. New methods in the field of spatial proteomics call for the development of algorithms and open up new avenues of research. The newly introduced Molecular Pixelation (MPX) provides spatial information on surface proteins and their relationship with each other in single cells. This allows for in silico representation of neighborhoods of membrane proteins as graphs. In order to analyze this new data modality, we adapted local assortativity in networks of MPX single-cell graphs and created a method that is able to capture detailed information on the spatial relationships of proteins. The introduced method can evaluate the pairwise colocalization of proteins and access higher-order similarity to investigate the colocalization of multiple proteins at the same time. We evaluated the method using publicly available MPX datasets where T cells were treated with a chemokine to study uropod formation. We demonstrate that adjusted local assortativity detects the effects of the stimuli at both single- and multiple-marker levels, which enhances our understanding of the uropod formation. We also applied our method to treating cancerous B-cell lines using a therapeutic antibody. With the adjusted local assortativity, we recapitulated the effect of rituximab on the polarity of CD20. Our computational method together with MPX improves our understanding of not only the formation of cell polarity and protein colocalization under stimuli but also advancing the overall insight into immune reaction and reorganization of cell surface proteins, which in turn allows the design of novel therapies. We foresee its applicability to other types of biological spatial data when represented as undirected graphs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    具有拓扑数据分析(TDA)生成的持久性特征的可穿戴传感器数据分析在各种应用中取得了巨大的成功,然而,它遭受大量的计算和时间资源来提取拓扑特征。在本文中,我们的方法利用知识蒸馏(KD),涉及使用由TDA生成的原始时间序列和持久性图像训练的多个教师网络,分别。然而,从教师模型中直接转移知识,利用不同的特征作为学生模型的输入,会导致知识差距和有限的表现。为了解决这个问题,我们引入了一个强大的框架,该框架集成了来自两个不同教师的多模式功能,使学生能够有效地学习所需的知识。为了解释多模态的统计差异,利用基于熵的约束自适应加权机制来自动平衡教师的影响,并鼓励学生模型充分采用两位教师的知识。为了吸收不同风格模型产生的不同结构信息进行蒸馏,使用小批量中的批量和通道相似性。我们证明了该方法在可穿戴传感器数据上的有效性。
    Wearable sensor data analysis with persistence features generated by topological data analysis (TDA) has achieved great successes in various applications, however, it suffers from large computational and time resources for extracting topological features. In this paper, our approach utilizes knowledge distillation (KD) that involves the use of multiple teacher networks trained with the raw time-series and persistence images generated by TDA, respectively. However, direct transfer of knowledge from the teacher models utilizing different characteristics as inputs to the student model results in a knowledge gap and limited performance. To address this problem, we introduce a robust framework that integrates multimodal features from two different teachers and enables a student to learn desirable knowledge effectively. To account for statistical differences in multimodalities, entropy based constrained adaptive weighting mechanism is leveraged to automatically balance the effects of teachers and encourage the student model to adequately adopt the knowledge from two teachers. To assimilate dissimilar structural information generated by different style models for distillation, batch and channel similarities within a mini-batch are used. We demonstrate the effectiveness of the proposed method on wearable sensor data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    BACKGROUND: Human placenta hydrolysates (HPH), the study of which was initiated by the scientific school of Vladimir P. Filatov, are currently being investigated using modern proteomic technologies. HPH is a promising tool for maintaining the function of mitochondria and regenerating tissues and organs with a high content of mitochondria (liver, heart muscle, skeletal muscles, etc.). The molecular mechanisms of action of HPH are practically not studied.
    OBJECTIVE: Identification of mitochondrial support mitochondrial function-supporting peptides in HPH (Laennec, produced by Japan Bioproducts).
    METHODS: Data on the chemical structure of the peptides were collected through a mass spectrometric experiment. Then, to establish the amino acid sequences of the peptides, de novo peptide sequencing algorithms based on the mathematical theory of topological and metric analysis of chemographs were applied. Bioinformatic analysis of the peptide composition of HPH was carried out using the integral protein annotation method.
    RESULTS: The biological functions of 41 peptides in the composition of HPH have been identified and described. Among the target proteins, the activity of which is regulated by the identified peptides and significantly affects the function of mitochondria, are caspases (CASP1, CASP3, CASP4) and other proteins regulating apoptosis (BCL2, CANPL1, PPARA), MAP kinases (MAPK1, MAPK3, MAPK4, MAPK8, MAPK9 , MAPK10, MAPK14), AKT1/GSK3B/MTOR cascade kinases, and a number of other target proteins (ADGRG6 receptor, inhibitor of NF-êB kinase IKKE, pyruvate dehydrogenase 2/3/4, SIRT1 sirtuin deacetylase, ULK1 kinase).
    CONCLUSIONS: HPH peptides have been identified that promote inhibition of mitochondrial pore formation, apoptosis, and excessive mitochondrial autophagy under conditions of oxidative/toxic stress, chronic inflammation, and/or hyperinsulinemia.
    Актуальность. Гидролизаты плаценты человека (ГПЧ), начало изучения которых было положено научной школой В.П. Филатова, в настоящее время исследуются посредством современных протеомных технологий. ГПЧ представляют собой перспективное средство для поддержания функции митохондрий и регенерации тканей и органов с высоким содержанием митохондрий (печени, сердечной мышцы, скелетной мускулатуры и др.). Молекулярные механизмы действия ГПЧ практически не изучены. Цель. Идентификация в составе ГПЧ (Лаеннек, Japan Bioproducts) пептидов, поддерживающих функционирование митохондрий. Материалы и методы. Данные о химической структуре пептидов собирали посредством масс-спектрометрического эксперимента. Затем для установления аминокислотных последовательностей пептидов применены алгоритмы de novo секвенирования пептидов, основанные на математической теории топологического и метрического анализа хемографов. Биоинформационный анализ пептидного состава ГПЧ осуществлен посредством интегрального метода аннотации белков. Результаты. Идентифицированы и описаны биологические функции 41 пептида в составе ГПЧ. Среди таргетных белков, активность которых регулируется выявленными пептидами и существенно влияет на функцию митохондрий, представлены каспазы (CASP1, CASP3, CASP4) и другие белки регуляции апоптоза (BCL2, CANPL1, PPARA), митоген-активируемые протеинкиназы (MAPK1, MAPK3, MAPK4, MAPK8, MAPK9, MAPK10, MAPK14), киназы каскада AKT1/GSK3B/MTOR и ряд других таргетных белков (рецептор ADGRG6, ингибитор киназы IKKE ядерного фактора каппа-би (NF-êB), пируватдегидрогеназы 2/3/4, НАД-зависимая деацетилаза сиртуин SIRT1, киназа ULK1). Заключение. Установлены пептиды ГПЧ, способствующие торможению формирования митохондриальной поры, апоптоза и избыточной аутофагии митохондрий в условиях оксидативного/токсического стресса, хронического воспаления и/или гиперинсулинемии.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    持续同源性(PH)是一种拓扑数据分析(TDA)方法,可计算对噪声具有鲁棒性的高维数据的多尺度拓扑不变属性。虽然PH已经揭示了各种应用的有用模式,计算要求将应用限制在几千点的小数据集。我们介绍Dory,一种高效且可扩展的算法,可以在较大的数据集上计算稀疏Vietoris-Rips复合物的持久同源性,直到并包括维度2以及在字段Z2上。作为一个应用程序,我们以高分辨率计算人类基因组的PH,这是由包含大约300万个点的全基因组Hi-C数据集揭示的。现有算法无法处理它,而多莉在五分钟内处理了它,使用小于5GB的内存。结果表明,用生长素处理后,人类基因组的拓扑结构发生了显着变化,降解粘附素的分子,证实了粘附素在DNA环形成中起关键作用的假设。
    Persistent homology (PH) is an approach to topological data analysis (TDA) that computes multi-scale topologically invariant properties of high-dimensional data that are robust to noise. While PH has revealed useful patterns across various applications, computational requirements have limited applications to small data sets of a few thousand points. We present Dory, an efficient and scalable algorithm that can compute the persistent homology of sparse Vietoris-Rips complexes on larger data sets, up to and including dimension two and over the field Z2. As an application, we compute the PH of the human genome at high resolution as revealed by a genome-wide Hi-C data set containing approximately three million points. Extant algorithms were unable to process it, whereas Dory processed it within five minutes, using less than five GB of memory. Results show that the topology of the human genome changes significantly upon treatment with auxin, a molecule that degrades cohesin, corroborating the hypothesis that cohesin plays a crucial role in loop formation in DNA.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    儿童虐待可能会对大脑发育产生不利影响,从而影响行为,情感,和成年期的心理模式。在这项研究中,我们提出了一个分析管道,用于对虐待儿童和典型发育儿童脑白质拓扑结构的改变进行建模。我们进行拓扑数据分析(TDA),以评估儿童脑白质结构协方差网络的全局拓扑结构的变化。我们使用持久同源性,TDA中的一种代数技术,分析由结构磁共振成像和弥散张量成像构建的脑协方差网络的拓扑特征。我们开发了一种基于Wasserstein距离的统计推断新框架,以评估观察到的拓扑差异的重要性。使用这些方法将虐待儿童与典型的发育对照组进行比较,我们发现,虐待可能会增加白质结构的同质性,从而引起结构协方差的更高相关性;这反映在拓扑轮廓中。我们的发现强烈表明,TDA可以成为模拟大脑拓扑结构改变的有价值的框架。本研究中使用的MATLAB代码和处理后的数据可以在https://github.com/laplcebeltrami/maltreated上找到。
    我们使用拓扑数据分析(TDA)来研究遭受虐待的儿童白质中拓扑结构的改变。TDA中的持续同源性用于量化通常发育中的儿童与遭受虐待的儿童之间的拓扑差异,使用磁共振成像和扩散张量成像数据。计算拓扑特征之间的Wasserstein距离,以评估大脑网络中的差异。我们的发现表明,持续的同源性有效地表征了遭受虐待的儿童白质动力学的改变。
    Childhood maltreatment may adversely affect brain development and consequently influence behavioral, emotional, and psychological patterns during adulthood. In this study, we propose an analytical pipeline for modeling the altered topological structure of brain white matter in maltreated and typically developing children. We perform topological data analysis (TDA) to assess the alteration in the global topology of the brain white matter structural covariance network among children. We use persistent homology, an algebraic technique in TDA, to analyze topological features in the brain covariance networks constructed from structural magnetic resonance imaging and diffusion tensor imaging. We develop a novel framework for statistical inference based on the Wasserstein distance to assess the significance of the observed topological differences. Using these methods in comparing maltreated children with a typically developing control group, we find that maltreatment may increase homogeneity in white matter structures and thus induce higher correlations in the structural covariance; this is reflected in the topological profile. Our findings strongly suggest that TDA can be a valuable framework to model altered topological structures of the brain. The MATLAB codes and processed data used in this study can be found at https://github.com/laplcebeltrami/maltreated.
    We employ topological data analysis (TDA) to investigate altered topological structures in the white matter of children who have experienced maltreatment. Persistent homology in TDA is utilized to quantify topological differences between typically developing children and those subjected to maltreatment, using magnetic resonance imaging and diffusion tensor imaging data. The Wasserstein distance is computed between topological features to assess disparities in brain networks. Our findings demonstrate that persistent homology effectively characterizes the altered dynamics of white matter in children who have suffered maltreatment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    蛋白质随时间发生的变化提供了系统发育信号,可用于破译其进化史和生物体之间的关系。序列比较是获取这种系统发育信号的最常见方法,而那些基于3D结构的比较仍处于起步阶段。在这项研究中,我们提出了一种基于持续同源理论(PH)的有效方法来提取蛋白质结构中包含的系统发育信息。PH提供了有效且稳健的算法,用于从不同空间分辨率的嘈杂数据集中提取和比较几何特征。PH在生命科学中的应用越来越多,包括蛋白质的研究(例如分类,折叠)。然而,它从未被用来研究它们可能包含的系统发育信号。这里,使用518个蛋白质家族,代表22,940个蛋白质序列和结构,来自10个主要分类群体,我们表明,从蛋白质结构与PH计算的距离与从蛋白质序列计算的系统发育距离密切相关,在小型和大型进化尺度上。我们测试了几种计算PH距离的方法,并提出了一些改进方法,以提高它们与解决进化问题的相关性。这项工作通过提出一种访问蛋白质结构中包含的系统发育信号的有效方法,为进化生物学开辟了新的视角,以及生命科学中拓扑分析的未来发展。
    Changes that occur in proteins over time provide a phylogenetic signal that can be used to decipher their evolutionary history and the relationships between organisms. Sequence comparison is the most common way to access this phylogenetic signal, while those based on 3D structure comparisons are still in their infancy. In this study, we propose an effective approach based on Persistent Homology Theory (PH) to extract the phylogenetic information contained in protein structures. PH provides efficient and robust algorithms for extracting and comparing geometric features from noisy datasets at different spatial resolutions. PH has a growing number of applications in the life sciences, including the study of proteins (e.g. classification, folding). However, it has never been used to study the phylogenetic signal they may contain. Here, using 518 protein families, representing 22,940 protein sequences and structures, from 10 major taxonomic groups, we show that distances calculated with PH from protein structures correlate strongly with phylogenetic distances calculated from protein sequences, at both small and large evolutionary scales. We test several methods for calculating PH distances and propose some refinements to improve their relevance for addressing evolutionary questions. This work opens up new perspectives in evolutionary biology by proposing an efficient way to access the phylogenetic signal contained in protein structures, as well as future developments of topological analysis in the life sciences.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    精准医疗旨在提供基于患者个体特征的个性化护理,而不是针对疾病组或患者人口统计学的指南指导疗法。放射学和病理学来源的图像是关于存在的主要信息来源,type,和疾病状况。探索医学成像(“影像组学”)和数字病理学幻灯片(“pathomics”)中的细胞尺度结构的数学关系,为提取定性,而且越来越多,定量数据。这些分析方法,然而,可以通过应用数学领域产生的其他方法,例如微分几何和代数拓扑,在这种情况下仍未得到充分开发,从而显着增强。几何的优势在于它能够提供精确的局部测量,如曲率,这对于识别多个空间层面的异常至关重要。这些测量可以增强传统影像组学中提取的定量特征,导致更细微的诊断。相比之下,拓扑作为一个强大的形状描述符,捕获基本特征,如连接的组件和孔。拓扑数据分析领域最初是为了探索数据的形状,大脑中的功能性网络连接是一个突出的例子。越来越多,它的工具现在被用来探索医学图像和数字化病理幻灯片中物理结构的组织模式。通过利用微分几何和代数拓扑的工具,研究人员和临床医生可能能够获得更全面的,对医学图像的多层次理解,为精准医学的医疗设备做出贡献。
    Precision medicine aims to provide personalized care based on individual patient characteristics, rather than guideline-directed therapies for groups of diseases or patient demographics. Images-both radiology- and pathology-derived-are a major source of information on presence, type, and status of disease. Exploring the mathematical relationship of pixels in medical imaging (\"radiomics\") and cellular-scale structures in digital pathology slides (\"pathomics\") offers powerful tools for extracting both qualitative and, increasingly, quantitative data. These analytical approaches, however, may be significantly enhanced by applying additional methods arising from fields of mathematics such as differential geometry and algebraic topology that remain underexplored in this context. Geometry\'s strength lies in its ability to provide precise local measurements, such as curvature, that can be crucial for identifying abnormalities at multiple spatial levels. These measurements can augment the quantitative features extracted in conventional radiomics, leading to more nuanced diagnostics. By contrast, topology serves as a robust shape descriptor, capturing essential features such as connected components and holes. The field of topological data analysis was initially founded to explore the shape of data, with functional network connectivity in the brain being a prominent example. Increasingly, its tools are now being used to explore organizational patterns of physical structures in medical images and digitized pathology slides. By leveraging tools from both differential geometry and algebraic topology, researchers and clinicians may be able to obtain a more comprehensive, multi-layered understanding of medical images and contribute to precision medicine\'s armamentarium.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    图像分析技术为解释显微镜数据提供客观和可重复的统计数据。在更高的维度,三维(3D)体积和时空数据突出显示了静态2D焦平面之外的其他属性和行为。然而,增加的维度带来了增加的复杂性,和现有的3D数据的一般分割技术要么是原始的,或对特定生物结构高度专业化。借鉴二维拓扑数据分析(TDA)的原理,我们制定了一种3D分割算法,该算法实现了持久的同源性,以识别图像强度的变化。由此,我们推导出适用于空间和时空数据的两个单独的变体,分别。我们证明了这种分析在模拟数据上产生灵敏和特定的结果,并且可以区分荧光显微镜图像中突出的生物结构。不管他们的形状。此外,我们强调了时态TDA在追踪细胞谱系以及细胞和细胞器复制频率方面的功效.
    Image analysis techniques provide objective and reproducible statistics for interpreting microscopy data. At higher dimensions, three-dimensional (3D) volumetric and spatiotemporal data highlight additional properties and behaviors beyond the static 2D focal plane. However, increased dimensionality carries increased complexity, and existing techniques for general segmentation of 3D data are either primitive, or highly specialized to specific biological structures. Borrowing from the principles of 2D topological data analysis (TDA), we formulate a 3D segmentation algorithm that implements persistent homology to identify variations in image intensity. From this, we derive two separate variants applicable to spatial and spatiotemporal data, respectively. We demonstrate that this analysis yields both sensitive and specific results on simulated data and can distinguish prominent biological structures in fluorescence microscopy images, regardless of their shape. Furthermore, we highlight the efficacy of temporal TDA in tracking cell lineage and the frequency of cell and organelle replication.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号