关键词: GO enrichment KEGG enrichment classification algorithm feature selection protein subcellular location protein-protein interaction network

来  源:   DOI:10.3389/fgene.2021.783128   PDF(Pubmed)

Abstract:
Given the limitation of technologies, the subcellular localizations of proteins are difficult to identify. Predicting the subcellular localization and the intercellular distribution patterns of proteins in accordance with their specific biological roles, including validated functions, relationships with other proteins, and even their specific sequence characteristics, is necessary. The computational prediction of protein subcellular localizations can be performed on the basis of the sequence and the functional characteristics. In this study, the protein-protein interaction network, functional annotation of proteins and a group of direct proteins with known subcellular localization were used to construct models. To build efficient models, several powerful machine learning algorithms, including two feature selection methods, four classification algorithms, were employed. Some key proteins and functional terms were discovered, which may provide important contributions for determining protein subcellular locations. Furthermore, some quantitative rules were established to identify the potential subcellular localizations of proteins. As the first prediction model that uses direct protein annotation information (i.e., functional features) and STRING-based protein-protein interaction network (i.e., network features), our computational model can help promote the development of predictive technologies on subcellular localizations and provide a new approach for exploring the protein subcellular localization patterns and their potential biological importance.
摘要:
鉴于技术的局限性,蛋白质的亚细胞定位很难识别。根据蛋白质的特定生物学作用预测蛋白质的亚细胞定位和细胞间分布模式,包括经过验证的函数,与其他蛋白质的关系,甚至它们特定的序列特征,是必要的。可以基于序列和功能特征进行蛋白质亚细胞定位的计算预测。在这项研究中,蛋白质-蛋白质相互作用网络,使用蛋白质的功能注释和一组具有已知亚细胞定位的直接蛋白质来构建模型。为了建立有效的模型,几个强大的机器学习算法,包括两种特征选择方法,四种分类算法,被雇用。一些关键的蛋白质和功能术语被发现,这可能为确定蛋白质亚细胞位置提供重要贡献。此外,建立了一些定量规则来鉴定蛋白质的潜在亚细胞定位。作为第一个使用直接蛋白质注释信息的预测模型(即,功能特征)和基于STRING的蛋白质-蛋白质相互作用网络(即,网络功能),我们的计算模型可以帮助促进亚细胞定位预测技术的发展,并为探索蛋白质亚细胞定位模式及其潜在的生物学重要性提供了新的方法。
公众号