关键词: Evaluation Intrinsic disorder Meta-prediction Nucleic acid binding Protein function prediction Protein structure prediction Secondary structure Solvent accessibility Webserver

来  源:   DOI:10.1016/j.csbj.2022.05.003   PDF(Pubmed)

Abstract:
Sequence-based predictors of the residue-level protein function and structure cover a broad spectrum of characteristics including intrinsic disorder, secondary structure, solvent accessibility and binding to nucleic acids. They were catalogued and evaluated in numerous surveys and assessments. However, methods focusing on a given characteristic are studied separately from predictors of other characteristics, while they are typically used on the same proteins. We fill this void by studying complementarity of a representative collection of methods that target different predictions using a large, taxonomically consistent, and low similarity dataset of human proteins. First, we bridge the gap between the communities that develop structure-trained vs. disorder-trained predictors of binding residues. Motivated by a recent study of the protein-binding residue predictions, we empirically find that combining the structure-trained and disorder-trained predictors of the DNA-binding and RNA-binding residues leads to substantial improvements in predictive quality. Second, we investigate whether diverse predictors generate results that accurately reproduce relations between secondary structure, solvent accessibility, interaction sites, and intrinsic disorder that are present in the experimental data. Our empirical analysis concludes that predictions accurately reflect all combinations of these relations. Altogether, this study provides unique insights that support combining results produced by diverse residue-level predictors of protein function and structure.
摘要:
基于序列的残基水平的蛋白质功能和结构的预测涵盖了广泛的特征,包括内在紊乱,二级结构,溶剂可及性和与核酸的结合。在许多调查和评估中对它们进行了分类和评估。然而,针对给定特征的方法与其他特征的预测因素分开研究,而它们通常用于相同的蛋白质。我们通过研究针对不同预测的代表性方法集合的互补性来填补这一空白,在分类上是一致的,和人类蛋白质的低相似性数据集。首先,我们弥合了发展结构训练的社区与结合残基的无序训练预测因子。受最近对蛋白质结合残基预测的研究的启发,我们根据经验发现,结合DNA结合和RNA结合残基的结构训练和无序训练预测因子,可以显著提高预测质量.第二,我们调查不同的预测因子是否产生准确再现二级结构之间关系的结果,溶剂可及性,互动网站,和存在于实验数据中的内在紊乱。我们的实证分析得出结论,预测准确地反映了这些关系的所有组合。总之,这项研究提供了独特的见解,支持将蛋白质功能和结构的不同残基水平预测因子产生的结果相结合。
公众号