关键词: Consensus sequence DNA-binding proteins Double-stranded DNA-binding proteins K-segmentation Single-stranded DNA-binding proteins Support vector machine

Mesh : Amino Acid Sequence Computational Biology / methods Consensus Sequence DNA-Binding Proteins / chemistry Databases, Protein Datasets as Topic Support Vector Machine

来  源:   DOI:10.1016/j.ab.2019.113494   PDF(Sci-hub)

Abstract:
Identification of DNA-binding proteins (DNA-BPs) is a hot issue in protein science due to its key role in various biological processes. These processes are highly concerned with DNA-binding protein types. DNA-BPs are classified into single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs). SSBs mainly involved in DNA recombination, replication, and repair, while DSBs regulate transcription process, DNA cleavage, and chromosome packaging. In spite of the aforementioned significance, few methods have been proposed for discrimination of SSBs and DSBs. Therefore, more predictors with favorable performance are indispensable. In this work, we present an innovative predictor, called SDBP-Pred with a novel feature descriptor, named consensus sequence-based K-segmentation position-specific scoring matrix (CSKS-PSSM). We encoded the local discriminative features concealed in PSSM via K-segmentation strategy and the global potential features by applying the notion of the consensus sequence. The obtained feature vector then input to support vector machine (SVM) with linear, polynomial and radial base function (RBF) kernels. Our model with SVM-RBF achieved the highest accuracies on three tests namely jackknife, 10-fold, and independent tests, respectively than the recent method. The obtained prediction results illustrate the superlative prediction performance of SDBP-Pred over existing studies in the literature so far.
摘要:
DNA结合蛋白(DNA-BPs)的鉴定因其在各种生物过程中的关键作用而成为蛋白质科学的热点问题。这些过程与DNA结合蛋白类型高度相关。DNA-BP分为单链DNA结合蛋白(SSB)和双链DNA结合蛋白(DSB)。SSB主要参与DNA重组,复制,修复,而DSB调节转录过程,DNA切割,和染色体包装。尽管有上述意义,很少有人提出区分SSB和DSB的方法。因此,更多具有良好表现的预测因子是不可或缺的。在这项工作中,我们提出了一个创新的预测器,称为SDBP-Pred,具有新颖的特征描述符,命名为基于共有序列的K分割位置特定评分矩阵(CSKS-PSSM)。我们通过K分割策略对PSSM中隐藏的局部判别特征进行了编码,并通过应用共有序列的概念对全局潜在特征进行了编码。然后将得到的特征向量输入到支持向量机(SVM),多项式和径向基函数(RBF)内核。我们的SVM-RBF模型在三个测试中达到了最高的精度,即jackknife,10倍,和独立测试,分别比最近的方法。所获得的预测结果说明了SDBP-Pred相对于迄今为止文献中的现有研究的最高级预测性能。
公众号