Topological data analysis

  • 文章类型: Journal Article
    This paper is motivated by the need to stabilise the impact of deep learning (DL) training for medical image analysis on the conditioning of convolution filters in relation to model overfitting and robustness. We present a simple strategy to reduce square matrix condition numbers and investigate its effect on the spatial distributions of point clouds of well- and ill-conditioned matrices. For a square matrix, the SVD surgery strategy works by: (1) computing its singular value decomposition (SVD), (2) changing a few of the smaller singular values relative to the largest one, and (3) reconstructing the matrix by reverse SVD. Applying SVD surgery on CNN convolution filters during training acts as spectral regularisation of the DL model without requiring the learning of extra parameters. The fact that the further away a matrix is from the non-invertible matrices, the higher its condition number is suggests that the spatial distributions of square matrices and those of their inverses are correlated to their condition number distributions. We shall examine this assertion empirically by showing that applying various versions of SVD surgery on point clouds of matrices leads to bringing their persistent diagrams (PDs) closer to the matrices of the point clouds of their inverses.






  • 文章类型: Journal Article
    Methods used in topological data analysis naturally capture higher-order interactions in point cloud data embedded in a metric space. This methodology was recently extended to data living in an information space, by which we mean a space measured with an information theoretical distance. One such setting is a finite collection of discrete probability distributions embedded in the probability simplex measured with the relative entropy (Kullback-Leibler divergence). More generally, one can work with a Bregman divergence parameterized by a different notion of entropy. While theoretical algorithms exist for this setup, there is a paucity of implementations for exploring and comparing geometric-topological properties of various information spaces. The interest of this work is therefore twofold. First, we propose the first robust algorithms and software for geometric and topological data analysis in information space. Perhaps surprisingly, despite working with Bregman divergences, our design reuses robust libraries for the Euclidean case. Second, using the new software, we take the first steps towards understanding the geometric-topological structure of these spaces. In particular, we compare them with the more familiar spaces equipped with the Euclidean and Fisher metrics.






  • 文章类型: Journal Article
    OBJECTIVE: Patients with schizophrenia typically exhibit symptoms of disorganized thought and display concreteness and over-inclusion in verbal reports, depending on the level of abstraction. While concreteness and over-inclusion may appear contradictory, the underlying psychopathology that explains these symptoms remains unclear. In the current study, we used functional magnetic resonance imaging with an encoding modeling approach to examine how concepts of various words, represented as brain activity, are anomalously connected at different levels of abstraction in patients with schizophrenia.
    METHODS: Fourteen individuals diagnosed with schizophrenia and 17 healthy controls underwent functional magnetic resonance imaging to measure brain activity representing concepts of various words. We used a persistent homology (PH) method to analyze the topological structures of word representations in schizophrenia patients, healthy controls, and random data, across different levels of abstraction by varying dissimilarity scales in the representation space.
    RESULTS: The results revealed that patients with schizophrenia exhibited more homogeneous word relationships across different levels of abstraction compared with healthy controls. Additionally, topological structures exhibited a shift toward a random network structure in patients with schizophrenia compared with controls. The PH method successfully distinguished semantic representations of patients with schizophrenia from those of controls.
    CONCLUSIONS: The current results provide an explanation for the mechanisms underlying the deficits in abstraction ability observed in schizophrenia. The isotopic connection of individual concepts reflects both the reduction of contextual connections at a semantically fine-grained scale and the absence of clear boundaries between related concepts at a coarse scale, which lead to concreteness and over-inclusion, respectively.






  • 文章类型: Journal Article
    Patients are complex and heterogeneous; clinical data sets are complicated by noise, missing data, and the presence of mixed-type data. Using such data sets requires understanding the high-dimensional \"space of patients\", composed of all measurements that define all relevant phenotypes. The current state-of-the-art merely defines spatial groupings of patients using cluster analyses. Our goal is to apply topological data analysis (TDA), a new unsupervised technique, to obtain a more complete understanding of patient space. We applied TDA to a space of 266 previously untreated patients with Chronic Lymphocytic Leukemia (CLL), using the \"daisy\" metric to compute distances between clinical records. We found clear evidence for both loops and voids in the CLL data. To interpret these structures, we developed novel computational and graphical methods. The most persistent loop and the most persistent void can be explained using three dichotomized, prognostically important factors in CLL: IGHV somatic mutation status, beta-2 microglobulin, and Rai stage. In conclusion, patient space turns out to be richer and more complex than current models suggest. TDA could become a powerful tool in a researcher\'s arsenal for interpreting high-dimensional data by providing novel insights into biological processes and improving our understanding of clinical and biological data sets.






  • 文章类型: Journal Article
    Advances in spatial proteomics and protein colocalization are a driving force in the understanding of cellular mechanisms and their influence on biological processes. New methods in the field of spatial proteomics call for the development of algorithms and open up new avenues of research. The newly introduced Molecular Pixelation (MPX) provides spatial information on surface proteins and their relationship with each other in single cells. This allows for in silico representation of neighborhoods of membrane proteins as graphs. In order to analyze this new data modality, we adapted local assortativity in networks of MPX single-cell graphs and created a method that is able to capture detailed information on the spatial relationships of proteins. The introduced method can evaluate the pairwise colocalization of proteins and access higher-order similarity to investigate the colocalization of multiple proteins at the same time. We evaluated the method using publicly available MPX datasets where T cells were treated with a chemokine to study uropod formation. We demonstrate that adjusted local assortativity detects the effects of the stimuli at both single- and multiple-marker levels, which enhances our understanding of the uropod formation. We also applied our method to treating cancerous B-cell lines using a therapeutic antibody. With the adjusted local assortativity, we recapitulated the effect of rituximab on the polarity of CD20. Our computational method together with MPX improves our understanding of not only the formation of cell polarity and protein colocalization under stimuli but also advancing the overall insight into immune reaction and reorganization of cell surface proteins, which in turn allows the design of novel therapies. We foresee its applicability to other types of biological spatial data when represented as undirected graphs.






  • 文章类型: Journal Article
    Wearable sensor data analysis with persistence features generated by topological data analysis (TDA) has achieved great successes in various applications, however, it suffers from large computational and time resources for extracting topological features. In this paper, our approach utilizes knowledge distillation (KD) that involves the use of multiple teacher networks trained with the raw time-series and persistence images generated by TDA, respectively. However, direct transfer of knowledge from the teacher models utilizing different characteristics as inputs to the student model results in a knowledge gap and limited performance. To address this problem, we introduce a robust framework that integrates multimodal features from two different teachers and enables a student to learn desirable knowledge effectively. To account for statistical differences in multimodalities, entropy based constrained adaptive weighting mechanism is leveraged to automatically balance the effects of teachers and encourage the student model to adequately adopt the knowledge from two teachers. To assimilate dissimilar structural information generated by different style models for distillation, batch and channel similarities within a mini-batch are used. We demonstrate the effectiveness of the proposed method on wearable sensor data.






  • 文章类型: Journal Article
    BACKGROUND: Human placenta hydrolysates (HPH), the study of which was initiated by the scientific school of Vladimir P. Filatov, are currently being investigated using modern proteomic technologies. HPH is a promising tool for maintaining the function of mitochondria and regenerating tissues and organs with a high content of mitochondria (liver, heart muscle, skeletal muscles, etc.). The molecular mechanisms of action of HPH are practically not studied.
    OBJECTIVE: Identification of mitochondrial support mitochondrial function-supporting peptides in HPH (Laennec, produced by Japan Bioproducts).
    METHODS: Data on the chemical structure of the peptides were collected through a mass spectrometric experiment. Then, to establish the amino acid sequences of the peptides, de novo peptide sequencing algorithms based on the mathematical theory of topological and metric analysis of chemographs were applied. Bioinformatic analysis of the peptide composition of HPH was carried out using the integral protein annotation method.
    RESULTS: The biological functions of 41 peptides in the composition of HPH have been identified and described. Among the target proteins, the activity of which is regulated by the identified peptides and significantly affects the function of mitochondria, are caspases (CASP1, CASP3, CASP4) and other proteins regulating apoptosis (BCL2, CANPL1, PPARA), MAP kinases (MAPK1, MAPK3, MAPK4, MAPK8, MAPK9 , MAPK10, MAPK14), AKT1/GSK3B/MTOR cascade kinases, and a number of other target proteins (ADGRG6 receptor, inhibitor of NF-êB kinase IKKE, pyruvate dehydrogenase 2/3/4, SIRT1 sirtuin deacetylase, ULK1 kinase).
    CONCLUSIONS: HPH peptides have been identified that promote inhibition of mitochondrial pore formation, apoptosis, and excessive mitochondrial autophagy under conditions of oxidative/toxic stress, chronic inflammation, and/or hyperinsulinemia.
    Актуальность. Гидролизаты плаценты человека (ГПЧ), начало изучения которых было положено научной школой В.П. Филатова, в настоящее время исследуются посредством современных протеомных технологий. ГПЧ представляют собой перспективное средство для поддержания функции митохондрий и регенерации тканей и органов с высоким содержанием митохондрий (печени, сердечной мышцы, скелетной мускулатуры и др.). Молекулярные механизмы действия ГПЧ практически не изучены. Цель. Идентификация в составе ГПЧ (Лаеннек, Japan Bioproducts) пептидов, поддерживающих функционирование митохондрий. Материалы и методы. Данные о химической структуре пептидов собирали посредством масс-спектрометрического эксперимента. Затем для установления аминокислотных последовательностей пептидов применены алгоритмы de novo секвенирования пептидов, основанные на математической теории топологического и метрического анализа хемографов. Биоинформационный анализ пептидного состава ГПЧ осуществлен посредством интегрального метода аннотации белков. Результаты. Идентифицированы и описаны биологические функции 41 пептида в составе ГПЧ. Среди таргетных белков, активность которых регулируется выявленными пептидами и существенно влияет на функцию митохондрий, представлены каспазы (CASP1, CASP3, CASP4) и другие белки регуляции апоптоза (BCL2, CANPL1, PPARA), митоген-активируемые протеинкиназы (MAPK1, MAPK3, MAPK4, MAPK8, MAPK9, MAPK10, MAPK14), киназы каскада AKT1/GSK3B/MTOR и ряд других таргетных белков (рецептор ADGRG6, ингибитор киназы IKKE ядерного фактора каппа-би (NF-êB), пируватдегидрогеназы 2/3/4, НАД-зависимая деацетилаза сиртуин SIRT1, киназа ULK1). Заключение. Установлены пептиды ГПЧ, способствующие торможению формирования митохондриальной поры, апоптоза и избыточной аутофагии митохондрий в условиях оксидативного/токсического стресса, хронического воспаления и/или гиперинсулинемии.






  • 文章类型: Journal Article
    Persistent homology (PH) is an approach to topological data analysis (TDA) that computes multi-scale topologically invariant properties of high-dimensional data that are robust to noise. While PH has revealed useful patterns across various applications, computational requirements have limited applications to small data sets of a few thousand points. We present Dory, an efficient and scalable algorithm that can compute the persistent homology of sparse Vietoris-Rips complexes on larger data sets, up to and including dimension two and over the field Z2. As an application, we compute the PH of the human genome at high resolution as revealed by a genome-wide Hi-C data set containing approximately three million points. Extant algorithms were unable to process it, whereas Dory processed it within five minutes, using less than five GB of memory. Results show that the topology of the human genome changes significantly upon treatment with auxin, a molecule that degrades cohesin, corroborating the hypothesis that cohesin plays a crucial role in loop formation in DNA.






  • 文章类型: Journal Article
    Childhood maltreatment may adversely affect brain development and consequently influence behavioral, emotional, and psychological patterns during adulthood. In this study, we propose an analytical pipeline for modeling the altered topological structure of brain white matter in maltreated and typically developing children. We perform topological data analysis (TDA) to assess the alteration in the global topology of the brain white matter structural covariance network among children. We use persistent homology, an algebraic technique in TDA, to analyze topological features in the brain covariance networks constructed from structural magnetic resonance imaging and diffusion tensor imaging. We develop a novel framework for statistical inference based on the Wasserstein distance to assess the significance of the observed topological differences. Using these methods in comparing maltreated children with a typically developing control group, we find that maltreatment may increase homogeneity in white matter structures and thus induce higher correlations in the structural covariance; this is reflected in the topological profile. Our findings strongly suggest that TDA can be a valuable framework to model altered topological structures of the brain. The MATLAB codes and processed data used in this study can be found at
    We employ topological data analysis (TDA) to investigate altered topological structures in the white matter of children who have experienced maltreatment. Persistent homology in TDA is utilized to quantify topological differences between typically developing children and those subjected to maltreatment, using magnetic resonance imaging and diffusion tensor imaging data. The Wasserstein distance is computed between topological features to assess disparities in brain networks. Our findings demonstrate that persistent homology effectively characterizes the altered dynamics of white matter in children who have suffered maltreatment.






  • 文章类型: Journal Article
    Changes that occur in proteins over time provide a phylogenetic signal that can be used to decipher their evolutionary history and the relationships between organisms. Sequence comparison is the most common way to access this phylogenetic signal, while those based on 3D structure comparisons are still in their infancy. In this study, we propose an effective approach based on Persistent Homology Theory (PH) to extract the phylogenetic information contained in protein structures. PH provides efficient and robust algorithms for extracting and comparing geometric features from noisy datasets at different spatial resolutions. PH has a growing number of applications in the life sciences, including the study of proteins (e.g. classification, folding). However, it has never been used to study the phylogenetic signal they may contain. Here, using 518 protein families, representing 22,940 protein sequences and structures, from 10 major taxonomic groups, we show that distances calculated with PH from protein structures correlate strongly with phylogenetic distances calculated from protein sequences, at both small and large evolutionary scales. We test several methods for calculating PH distances and propose some refinements to improve their relevance for addressing evolutionary questions. This work opens up new perspectives in evolutionary biology by proposing an efficient way to access the phylogenetic signal contained in protein structures, as well as future developments of topological analysis in the life sciences.





