关键词: S. pombe cell phenotype computational biology computational prediction functional genomics gene ontology genetics genomics machine learning systems biology unknown protein

Mesh : Humans Phenomics Schizosaccharomyces pombe Proteins / genetics Phenotype Schizosaccharomyces / genetics Machine Learning

来  源:   DOI:10.7554/eLife.88229   PDF(Pubmed)

Abstract:
Many proteins remain poorly characterized even in well-studied organisms, presenting a bottleneck for research. We applied phenomics and machine-learning approaches with Schizosaccharomyces pombe for broad cues on protein functions. We assayed colony-growth phenotypes to measure the fitness of deletion mutants for 3509 non-essential genes in 131 conditions with different nutrients, drugs, and stresses. These analyses exposed phenotypes for 3492 mutants, including 124 mutants of \'priority unstudied\' proteins conserved in humans, providing varied functional clues. For example, over 900 proteins were newly implicated in the resistance to oxidative stress. Phenotype-correlation networks suggested roles for poorly characterized proteins through \'guilt by association\' with known proteins. For complementary functional insights, we predicted Gene Ontology (GO) terms using machine learning methods exploiting protein-network and protein-homology data (NET-FF). We obtained 56,594 high-scoring GO predictions, of which 22,060 also featured high information content. Our phenotype-correlation data and NET-FF predictions showed a strong concordance with existing PomBase GO annotations and protein networks, with integrated analyses revealing 1675 novel GO predictions for 783 genes, including 47 predictions for 23 priority unstudied proteins. Experimental validation identified new proteins involved in cellular aging, showing that these predictions and phenomics data provide a rich resource to uncover new protein functions.
摘要:
即使在经过充分研究的生物体中,许多蛋白质的特征仍然很差,给研究带来瓶颈。我们将表型组学和机器学习方法与裂殖酵母一起应用于蛋白质功能的广泛线索。我们分析了菌落生长表型,以测量在131种不同营养条件下3509个非必需基因的缺失突变体的适合度。毒品,和压力。这些分析揭示了3492个突变体的表型,包括124个在人类中保守的“优先未研究的蛋白质”突变体,提供各种功能线索。例如,超过900种蛋白质新涉及对氧化应激的抗性。表型相关网络通过与已知蛋白质的“关联罪恶感”暗示了特征不佳的蛋白质的作用。对于互补的功能见解,我们使用机器学习方法利用蛋白质网络和蛋白质同源性数据(NET-FF)预测基因本体论(GO)术语。我们获得了56,594个高得分的GO预测,其中22,060人的信息量也很高。我们的表型相关数据和NET-FF预测显示与现有的PomBaseGO注释和蛋白质网络有很强的一致性,综合分析揭示了783个基因的1675个新的GO预测,包括对23种优先未研究蛋白质的47种预测。实验验证确定了参与细胞衰老的新蛋白质,表明这些预测和表型数据为揭示新的蛋白质功能提供了丰富的资源。
公众号