关键词: Biological network Clustering method Correlation network analysis Dimension reduction Spectral clustering

Mesh : Cluster Analysis Gene Regulatory Networks Protein Interaction Maps Mathematical Concepts Humans Computational Biology Algorithms Models, Biological Gene Expression Profiling / statistics & numerical data methods

来  源:   DOI:10.1007/s11538-024-01335-8   PDF(Pubmed)

Abstract:
The growing complexity of biological data has spurred the development of innovative computational techniques to extract meaningful information and uncover hidden patterns within vast datasets. Biological networks, such as gene regulatory networks and protein-protein interaction networks, hold critical insights into biological features\' connections and functions. Integrating and analyzing high-dimensional data, particularly in gene expression studies, stands prominent among the challenges in deciphering these networks. Clustering methods play a crucial role in addressing these challenges, with spectral clustering emerging as a potent unsupervised technique considering intrinsic geometric structures. However, spectral clustering\'s user-defined cluster number can lead to inconsistent and sometimes orthogonal clustering regimes. We propose the Multi-layer Bundling (MLB) method to address this limitation, combining multiple prominent clustering regimes to offer a comprehensive data view. We call the outcome clusters \"bundles\". This approach refines clustering outcomes, unravels hierarchical organization, and identifies bridge elements mediating communication between network components. By layering clustering results, MLB provides a global-to-local view of biological feature clusters enabling insights into intricate biological systems. Furthermore, the method enhances bundle network predictions by integrating the bundle co-cluster matrix with the affinity matrix. The versatility of MLB extends beyond biological networks, making it applicable to various domains where understanding complex relationships and patterns is needed.
摘要:
生物数据的日益复杂刺激了创新计算技术的发展,以提取有意义的信息并发现大量数据集中的隐藏模式。生物网络,如基因调控网络和蛋白质-蛋白质相互作用网络,对生物特征的连接和功能持有关键见解。集成和分析高维数据,特别是在基因表达研究中,在破译这些网络的挑战中,这是突出的。聚类方法在解决这些挑战中起着至关重要的作用,考虑到固有的几何结构,谱聚类成为一种有效的无监督技术。然而,频谱聚类的用户定义的聚类编号可能导致不一致,有时甚至是正交的聚类机制。我们提出了多层捆绑(MLB)方法来解决这个限制,结合多个突出的聚类制度,提供一个全面的数据视图。我们将结果群集称为“bundle”。这种方法改进了聚类结果,解开等级制度,并标识在网络组件之间进行通信的网桥元素。通过分层聚类结果,MLB提供生物特征簇的全局到局部视图,从而能够洞察复杂的生物系统。此外,该方法通过将束协同聚类矩阵与亲和矩阵相结合来增强束网络预测。MLB的多功能性超越了生物网络,使其适用于需要理解复杂关系和模式的各种领域。
公众号