persistent homology

持久性同源性
  • 文章类型: Journal Article
    物理性质预测和合成过程优化是材料信息学的关键目标。在这项研究中,我们提出了一种机器学习方法,利用岭回归来预测燃料电池电极表面的氧渗透率,并确定最佳工艺温度。这些预测是基于从使用透射电子显微镜(TEM)捕获的断层摄影图像得出的持久性图。通过机器学习分析Pt/CeO2纳米复合材料中存在的复杂结构,我们发现,考虑不同结构元素的l2正则化比l1正则化(稀疏建模)更合适。值得注意的是,我们的模型成功地捕获了氧气渗透率的活化能,这种现象不能完全用贝蒂数的几何特征来解释,正如先前的研究所证明的那样。岭回归系数与持久性图之间的对应关系揭示了CeO2的局部和三维结构的形成过程及其对指数前因子和活化能的贡献。该分析有助于确定实现最佳结构和准确预测物理性质所需的退火温度。
    Physical property prediction and synthesis process optimization are key targets in material informatics. In this study, we propose a machine learning approach that utilizes ridge regression to predict the oxygen permeability at fuel cell electrode surfaces and determine the optimal process temperature. These predictions are based on a persistence diagram derived from tomographic images captured using transmission electron microscopy (TEM). Through machine learning analysis of the complex structures present in the Pt/CeO2 nanocomposites, we discovered that l2 regularization considering diverse structural elements is more appropriate than l1 regularization (sparse modeling). Notably, our model successfully captured the activation energy of oxygen permeability, a phenomenon that could not be solely explained by the geometric feature of the Betti numbers, as demonstrated in a previous study. The correspondence between the ridge regression coefficient and persistence diagram revealed the formation process of the local and three-dimensional structures of CeO2 and their contributions to pre-exponential factor and activation energies. This analysis facilitated the determination of the annealing temperature required to achieve the optimal structure and accurately predict the physical properties.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    儿童虐待可能会对大脑发育产生不利影响,从而影响行为,情感,和成年期的心理模式。在这项研究中,我们提出了一个分析管道,用于对虐待儿童和典型发育儿童脑白质拓扑结构的改变进行建模。我们进行拓扑数据分析(TDA),以评估儿童脑白质结构协方差网络的全局拓扑结构的变化。我们使用持久同源性,TDA中的一种代数技术,分析由结构磁共振成像和弥散张量成像构建的脑协方差网络的拓扑特征。我们开发了一种基于Wasserstein距离的统计推断新框架,以评估观察到的拓扑差异的重要性。使用这些方法将虐待儿童与典型的发育对照组进行比较,我们发现,虐待可能会增加白质结构的同质性,从而引起结构协方差的更高相关性;这反映在拓扑轮廓中。我们的发现强烈表明,TDA可以成为模拟大脑拓扑结构改变的有价值的框架。本研究中使用的MATLAB代码和处理后的数据可以在https://github.com/laplcebeltrami/maltreated上找到。
    我们使用拓扑数据分析(TDA)来研究遭受虐待的儿童白质中拓扑结构的改变。TDA中的持续同源性用于量化通常发育中的儿童与遭受虐待的儿童之间的拓扑差异,使用磁共振成像和扩散张量成像数据。计算拓扑特征之间的Wasserstein距离,以评估大脑网络中的差异。我们的发现表明,持续的同源性有效地表征了遭受虐待的儿童白质动力学的改变。
    Childhood maltreatment may adversely affect brain development and consequently influence behavioral, emotional, and psychological patterns during adulthood. In this study, we propose an analytical pipeline for modeling the altered topological structure of brain white matter in maltreated and typically developing children. We perform topological data analysis (TDA) to assess the alteration in the global topology of the brain white matter structural covariance network among children. We use persistent homology, an algebraic technique in TDA, to analyze topological features in the brain covariance networks constructed from structural magnetic resonance imaging and diffusion tensor imaging. We develop a novel framework for statistical inference based on the Wasserstein distance to assess the significance of the observed topological differences. Using these methods in comparing maltreated children with a typically developing control group, we find that maltreatment may increase homogeneity in white matter structures and thus induce higher correlations in the structural covariance; this is reflected in the topological profile. Our findings strongly suggest that TDA can be a valuable framework to model altered topological structures of the brain. The MATLAB codes and processed data used in this study can be found at https://github.com/laplcebeltrami/maltreated.
    We employ topological data analysis (TDA) to investigate altered topological structures in the white matter of children who have experienced maltreatment. Persistent homology in TDA is utilized to quantify topological differences between typically developing children and those subjected to maltreatment, using magnetic resonance imaging and diffusion tensor imaging data. The Wasserstein distance is computed between topological features to assess disparities in brain networks. Our findings demonstrate that persistent homology effectively characterizes the altered dynamics of white matter in children who have suffered maltreatment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    蛋白质随时间发生的变化提供了系统发育信号,可用于破译其进化史和生物体之间的关系。序列比较是获取这种系统发育信号的最常见方法,而那些基于3D结构的比较仍处于起步阶段。在这项研究中,我们提出了一种基于持续同源理论(PH)的有效方法来提取蛋白质结构中包含的系统发育信息。PH提供了有效且稳健的算法,用于从不同空间分辨率的嘈杂数据集中提取和比较几何特征。PH在生命科学中的应用越来越多,包括蛋白质的研究(例如分类,折叠)。然而,它从未被用来研究它们可能包含的系统发育信号。这里,使用518个蛋白质家族,代表22,940个蛋白质序列和结构,来自10个主要分类群体,我们表明,从蛋白质结构与PH计算的距离与从蛋白质序列计算的系统发育距离密切相关,在小型和大型进化尺度上。我们测试了几种计算PH距离的方法,并提出了一些改进方法,以提高它们与解决进化问题的相关性。这项工作通过提出一种访问蛋白质结构中包含的系统发育信号的有效方法,为进化生物学开辟了新的视角,以及生命科学中拓扑分析的未来发展。
    Changes that occur in proteins over time provide a phylogenetic signal that can be used to decipher their evolutionary history and the relationships between organisms. Sequence comparison is the most common way to access this phylogenetic signal, while those based on 3D structure comparisons are still in their infancy. In this study, we propose an effective approach based on Persistent Homology Theory (PH) to extract the phylogenetic information contained in protein structures. PH provides efficient and robust algorithms for extracting and comparing geometric features from noisy datasets at different spatial resolutions. PH has a growing number of applications in the life sciences, including the study of proteins (e.g. classification, folding). However, it has never been used to study the phylogenetic signal they may contain. Here, using 518 protein families, representing 22,940 protein sequences and structures, from 10 major taxonomic groups, we show that distances calculated with PH from protein structures correlate strongly with phylogenetic distances calculated from protein sequences, at both small and large evolutionary scales. We test several methods for calculating PH distances and propose some refinements to improve their relevance for addressing evolutionary questions. This work opens up new perspectives in evolutionary biology by proposing an efficient way to access the phylogenetic signal contained in protein structures, as well as future developments of topological analysis in the life sciences.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    单细胞RNA测序(scRNA-seq)被广泛用于揭示细胞中的异质性,这给了我们对细胞间通信的见解,细胞分化,和差异基因表达。然而,由于稀疏性和涉及的大量基因,分析scRNA-seq数据是一个挑战。因此,降维和特征选择对于消除杂散信号和增强下游分析很重要。传统的PCA,降维的主要主力,缺乏捕获嵌入数据中的几何结构信息的能力,和以前的图拉普拉斯正则化仅受到单一尺度分析的限制。我们通过将持久拉普拉斯(PL)技术和L2,1范数正则化相结合,提出了一种拓扑主成分分析(tPCA)方法,以解决数据中的多尺度和多类异质性问题。我们进一步引入k-最近邻(kNN)持久拉普拉斯技术来提高我们的持久拉普拉斯方法的鲁棒性。提出的kNN-PL是一种新的代数拓扑技术,它解决了传统持久同源性的许多局限性。不是通过改变距离阈值来诱导过滤,我们引入了kNN-tPCA,通过在每个步骤中改变kNN网络中邻居的数量来实现过滤,并发现该框架对超参数调整具有重要意义。我们在11个不同的基准scRNA-seq数据集上验证了我们提出的tPCA和kNN-tPCA方法的有效性,并展示了我们的方法优于文献中的其他无监督PCA增强,以及流行的统一流形近似(UMAP),t分布随机邻居嵌入(tSNE),和投影非负矩阵分解(NMF)的显著边际。例如,tPCA提供高达628%,78%,和149%的改进UMAP,tSNE,和NMF,分别在F1度量中进行分类,kNN-tPCA提供53%,63%,对UMAP进行了32%的改进,tSNE,和NMF,分别在ARI度量中的聚类上。
    Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L2,1 norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins. For example, tPCA provides up to 628%, 78%, and 149% improvements to UMAP, tSNE, and NMF, respectively on classification in the F1 metric, and kNN-tPCA offers 53%, 63%, and 32% improvements to UMAP, tSNE, and NMF, respectively on clustering in the ARI metric.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    单细胞RNA测序(scRNA-seq)能够解剖组织中的细胞异质性,导致许多生物学发现。已经设计了各种计算方法来通过聚类scRNA-seq数据来描绘细胞类型,其中簇通常使用标记基因的先验知识进行注释。除了识别纯细胞类型,已经开发了几种方法来识别经历状态转换的细胞,这通常依赖于先前的聚类结果。当前的计算方法主要使用图表示来研究scRNA-seq数据的局部和一阶结构,而scRNA-seq数据经常显示复杂的高维结构。这里,我们介绍scGeom,一种工具,通过分析细胞和基因网络的曲率和持续同源性来分析几何和拓扑结构,从而利用scRNA-seq数据中的多尺度和多维结构。我们证明了这些结构特征在几种应用中反映生物学特性和功能的实用性,其中我们显示了细胞和基因网络的曲率和拓扑特征可以帮助指示过渡细胞和细胞的分化潜力。我们还说明了结构特征可以改善细胞类型的分类。
    Single-cell RNA sequencing (scRNA-seq) enables dissecting cellular heterogeneity in tissues, resulting in numerous biological discoveries. Various computational methods have been devised to delineate cell types by clustering scRNA-seq data, where clusters are often annotated using prior knowledge of marker genes. In addition to identifying pure cell types, several methods have been developed to identify cells undergoing state transitions, which often rely on prior clustering results. The present computational approaches predominantly investigate the local and first-order structures of scRNA-seq data using graph representations, while scRNA-seq data frequently display complex high-dimensional structures. Here, we introduce scGeom, a tool that exploits the multiscale and multidimensional structures in scRNA-seq data by analyzing the geometry and topology through curvature and persistent homology of both cell and gene networks. We demonstrate the utility of these structural features to reflect biological properties and functions in several applications, where we show that curvatures and topological signatures of cell and gene networks can help indicate transition cells and the differentiation potential of cells. We also illustrate that structural characteristics can improve the classification of cell types.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    突触形成是一种独特的生物学现象。这种现象的分子生物学观点不同于分形几何。然而,这些观点并不是相互排斥和相辅相成的。第一个的基石是具有马尔可夫性质的生化反应链,也就是说,一个确定性的,有条件的,在时间和空间上有序的无记忆过程,其中连续阶段由一些调节蛋白的表达决定。导致突触形成的分子和细胞事件的协调发生在分形时空,也就是说,空间不仅是事件的舞台,而且还积极影响这些事件。由于时间和空间通过非线性动力学耦合,因此出现了该时间空间。突触形成的过程具有分形动力学,具有非高斯分布的概率和减少数量的分子马尔可夫链,可用于传递生物相关信息。
    Synapse formation is a unique biological phenomenon. The molecular biological perspective of this phenomenon is different from the fractal geometrical one. However, these perspectives are not mutually exclusive and supplement each other. The cornerstone of the first one is a chain of biochemical reactions with the Markov property, that is, a deterministic, conditional, memoryless process ordered in time and in space, in which the consecutive stages are determined by the expression of some regulatory proteins. The coordination of molecular and cellular events leading to synapse formation occurs in fractal time space, that is, the space that is not only the arena of events but also actively influences those events. This time space emerges owing to coupling of time and space through nonlinear dynamics. The process of synapse formation possesses fractal dynamics with non-Gaussian distribution of probability and a reduced number of molecular Markov chains ready for transfer of biologically relevant information.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在这项研究中,我们专注于训练循环尖峰神经网络,以闭合二维轨迹的形式生成时空模式。使用Victor-Purpura距离检查经过训练的网络中的Spike列车的差异性。我们将代数拓扑方法应用于通过对距离矩阵的条目进行秩排序而获得的矩阵,具体计算持久性条形码和Betti曲线。通过比较不同类型输出模式的特征,我们揭示了低维目标信号与潜在的多维尖峰序列之间的复杂关系。
    In this study, we focus on training recurrent spiking neural networks to generate spatiotemporal patterns in the form of closed two-dimensional trajectories. Spike trains in the trained networks are examined in terms of their dissimilarity using the Victor-Purpura distance. We apply algebraic topology methods to the matrices obtained by rank-ordering the entries of the distance matrices, specifically calculating the persistence barcodes and Betti curves. By comparing the features of different types of output patterns, we uncover the complex relations between low-dimensional target signals and the underlying multidimensional spike trains.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    自闭症谱系障碍(ASD)是一种普遍的大脑发育疾病。最近,ASD的发病率逐年上升,对ASD患者的生命和家庭构成了极大的威胁。因此,ASD的研究已经变得非常重要。保留数据固有信息并降低数据复杂性的适当特征表示对于已建立模型的性能非常重要。拓扑数据分析(TDA)是一种新兴而强大的数学工具,用于表征形状和描述复杂数据中的内在信息。在TDA,持久性条形码或图表通常被视为数据拓扑特征的可视化表示。在本文中,从自闭症脑成像数据交换(ABIDE)数据库中获得的受试者的区域同质性(ReHo)数据用于通过TDA提取特征。ABIDEI数据库交叉验证的平均准确率为95.6%,高于任何其他现有方法(现有方法中的最高准确率为93.59%)。在ABIDEII数据库中使用ABIDEI进行相同分辨率采样的平均准确度为96.5%,也高于任何其他现有方法(现有方法中的最高准确度为75.17%)。
    Autism spectrum disorder (ASD) is a pervasive brain development disease. Recently, the incidence rate of ASD has increased year by year and posed a great threat to the lives and families of individuals with ASD. Therefore, the study of ASD has become very important. A suitable feature representation that preserves the data intrinsic information and also reduces data complexity is very vital to the performance of established models. Topological data analysis (TDA) is an emerging and powerful mathematical tool for characterizing shapes and describing intrinsic information in complex data. In TDA, persistence barcodes or diagrams are usually regarded as visual representations of topological features of data. In this paper, the Regional Homogeneity (ReHo) data of subjects obtained from Autism Brain Imaging Data Exchange (ABIDE) database were used to extract features by using TDA. The average accuracy of cross validation on ABIDE I database was 95.6% that was higher than any other existing methods (the highest accuracy among existing methods was 93.59%). The average accuracy for sampling with the same resolutions with the ABIDE I on the ABIDE II database was 96.5% that was also higher than any other existing methods (the highest accuracy among existing methods was 75.17%).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基底细胞癌(BCC)的关键临床指标是毛细血管扩张的存在(狭窄,皮肤病变内的增脂血管)。如今,许多皮肤癌成像过程都利用深度学习(DL)模型进行诊断,特征的分割,和特征分析。要扩展自动诊断,最近的计算智能研究也探索了拓扑数据分析(TDA)领域,数学的一个分支,使用拓扑从高度复杂的数据中提取有意义的信息。本研究将TDA和DL与集成学习相结合,创建了混合的TDA-DLBCC诊断模型。实现持久性同源性(TDA技术)以从自动分割的毛细血管扩张以及皮肤病变中提取拓扑特征。和DL特征是通过微调预训练的EfficientNet-B5模型生成的。最终的混合TDA-DL模型在用于BCC诊断的395个皮肤病变的保持测试中实现了97.4%的最新精度和0.995的AUC。这项研究表明,毛细血管扩张的特点改善了BCC的诊断,和TDA技术具有提高DL性能的潜力。
    A critical clinical indicator for basal cell carcinoma (BCC) is the presence of telangiectasia (narrow, arborizing blood vessels) within the skin lesions. Many skin cancer imaging processes today exploit deep learning (DL) models for diagnosis, segmentation of features, and feature analysis. To extend automated diagnosis, recent computational intelligence research has also explored the field of Topological Data Analysis (TDA), a branch of mathematics that uses topology to extract meaningful information from highly complex data. This study combines TDA and DL with ensemble learning to create a hybrid TDA-DL BCC diagnostic model. Persistence homology (a TDA technique) is implemented to extract topological features from automatically segmented telangiectasia as well as skin lesions, and DL features are generated by fine-tuning a pre-trained EfficientNet-B5 model. The final hybrid TDA-DL model achieves state-of-the-art accuracy of 97.4% and an AUC of 0.995 on a holdout test of 395 skin lesions for BCC diagnosis. This study demonstrates that telangiectasia features improve BCC diagnosis, and TDA techniques hold the potential to improve DL performance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    生物学的中心目标是了解遗传变异如何产生表型变异,已被描述为基因型到表型(G到P)图。植物形态由内在发育和外在环境输入不断塑造,因此,植物表型是高度多变量的,需要全面的方法来完全量化。然而,植物表型鉴定工作中的一个常见假设是,一些预先选择的测量可以充分描述相关的表型空间。我们对根系结构的遗传基础了解不足至少部分是这种不一致的结果。根系是复杂的3D结构,通常以相对简单的单变量特征测量的2D表示进行研究。在之前的工作中,我们证明了持续的同源性,一种拓扑数据分析方法,不预先假定数据的显著特征,可以扩展表型性状空间,并从常用的2D根表型平台识别新的G到P关系。在这里,我们将工作扩展到来自作图种群的玉米幼苗的整个3D根系结构,该作图种群旨在了解玉米-氮关系的遗传基础。使用84个单变量性状的面板,为3D分支开发的持续同源方法,和集体特征空间的多元向量,我们发现每种方法都能捕获有关根系变异的不同信息,大多数非重叠QTL证明了这一点,因此,根表型性状空间不容易耗尽。这项工作提供了一种数据驱动的方法来评估3D根结构,并强调了非规范表型对于更准确地表示G到P图的重要性。
    A central goal of biology is to understand how genetic variation produces phenotypic variation, which has been described as a genotype to phenotype (G to P) map. The plant form is continuously shaped by intrinsic developmental and extrinsic environmental inputs, and therefore plant phenomes are highly multivariate and require comprehensive approaches to fully quantify. Yet a common assumption in plant phenotyping efforts is that a few pre-selected measurements can adequately describe the relevant phenome space. Our poor understanding of the genetic basis of root system architecture is at least partially a result of this incongruence. Root systems are complex 3D structures that are most often studied as 2D representations measured with relatively simple univariate traits. In prior work, we showed that persistent homology, a topological data analysis method that does not pre-suppose the salient features of the data, could expand the phenotypic trait space and identify new G to P relations from a commonly used 2D root phenotyping platform. Here we extend the work to entire 3D root system architectures of maize seedlings from a mapping population that was designed to understand the genetic basis of maize-nitrogen relations. Using a panel of 84 univariate traits, persistent homology methods developed for 3D branching, and multivariate vectors of the collective trait space, we found that each method captures distinct information about root system variation as evidenced by the majority of non-overlapping QTL, and hence that root phenotypic trait space is not easily exhausted. The work offers a data-driven method for assessing 3D root structure and highlights the importance of non-canonical phenotypes for more accurate representations of the G to P map.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号