关键词: Gaussian mixture model (GMM) Gene expressions Gene regulatory network (GRN) Protein-protein interaction networks Transitive protein-protein interactions

Mesh : Bayes Theorem Gene Expression Profiling Gene Regulatory Networks Normal Distribution Protein Interaction Maps Systems Biology / methods

来  源:   DOI:10.1186/s12918-019-0695-x   PDF(Sci-hub)   PDF(Pubmed)

Abstract:
Systematic fusion of multiple data sources for Gene Regulatory Networks (GRN) inference remains a key challenge in systems biology. We incorporate information from protein-protein interaction networks (PPIN) into the process of GRN inference from gene expression (GE) data. However, existing PPIN remain sparse and transitive protein interactions can help predict missing protein interactions. We therefore propose a systematic probabilistic framework on fusing GE data and transitive protein interaction data to coherently build GRN.
We use a Gaussian Mixture Model (GMM) to soft-cluster GE data, allowing overlapping cluster memberships. Next, a heuristic method is proposed to extend sparse PPIN by incorporating transitive linkages. We then propose a novel way to score extended protein interactions by combining topological properties of PPIN and correlations of GE. Following this, GE data and extended PPIN are fused using a Gaussian Hidden Markov Model (GHMM) in order to identify gene regulatory pathways and refine interaction scores that are then used to constrain the GRN structure. We employ a Bayesian Gaussian Mixture (BGM) model to refine the GRN derived from GE data by using the structural priors derived from GHMM. Experiments on real yeast regulatory networks demonstrate both the feasibility of the extended PPIN in predicting transitive protein interactions and its effectiveness on improving the coverage and accuracy the proposed method of fusing PPIN and GE to build GRN.
The GE and PPIN fusion model outperforms both the state-of-the-art single data source models (CLR, GENIE3, TIGRESS) as well as existing fusion models under various constraints.
摘要:
用于基因调控网络(GRN)推理的多个数据源的系统融合仍然是系统生物学中的关键挑战。我们将来自蛋白质-蛋白质相互作用网络(PPIN)的信息纳入来自基因表达(GE)数据的GRN推断过程中。然而,现有的PPIN保持稀疏,传递蛋白相互作用可以帮助预测缺失的蛋白相互作用。因此,我们提出了一个融合GE数据和传递蛋白相互作用数据的系统概率框架,以连贯地构建GRN。
我们使用高斯混合模型(GMM)对GE数据进行软聚类,允许重叠的集群成员。接下来,提出了一种启发式方法,通过引入传递链接来扩展稀疏PPIN。然后,我们提出了一种通过结合PPIN的拓扑特性和GE的相关性来对扩展的蛋白质相互作用进行评分的新方法。在此之后,使用高斯隐马尔可夫模型(GHMM)融合GE数据和扩展的PPIN,以识别基因调控途径并细化相互作用得分,然后将其用于约束GRN结构。我们采用贝叶斯高斯混合(BGM)模型,通过使用从GHMM导出的结构先验来细化从GE数据导出的GRN。在真实酵母调控网络上的实验证明了扩展的PPIN在预测传递蛋白相互作用中的可行性,以及其在提高所提出的融合PPIN和GE构建GRN的方法的覆盖率和准确性方面的有效性。
GE和PPIN融合模型的性能优于最先进的单数据源模型(CLR,GENIE3,TIGRESS)以及各种约束下的现有融合模型。
公众号