关键词: Feature selection IL-13 IL-13 peptides Machine learning Peptide prediction mRMR

Mesh : Humans Interleukin-13 Bayes Theorem COVID-19 Peptides Machine Learning

来  源:   DOI:10.1186/s12859-023-05248-6

Abstract:
BACKGROUND: Inflammatory mediators play havoc in several diseases including the novel Coronavirus disease 2019 (COVID-19) and generally correlate with the severity of the disease. Interleukin-13 (IL-13), is a pleiotropic cytokine that is known to be associated with airway inflammation in asthma and reactive airway diseases, in neoplastic and autoimmune diseases. Interestingly, the recent association of IL-13 with COVID-19 severity has sparked interest in this cytokine. Therefore characterization of new molecules which can regulate IL-13 induction might lead to novel therapeutics.
RESULTS: Here, we present an improved prediction of IL-13-inducing peptides. The positive and negative datasets were obtained from a recent study (IL13Pred) and the Pfeature algorithm was used to compute features for the peptides. As compared to the state-of-the-art which used the regularization based feature selection technique (linear support vector classifier with the L1 penalty), we used a multivariate feature selection technique (minimum redundancy maximum relevance) to obtain non-redundant and highly relevant features. In the proposed study (improved IL-13 prediction (iIL13Pred)), the use of the mRMR feature selection method is instrumental in choosing the most discriminatory features of IL-13-inducing peptides with improved performance. We investigated seven common machine learning classifiers including Decision Tree, Gaussian Naïve Bayes, k-Nearest Neighbour, Logistic Regression, Support Vector Machine, Random Forest, and extreme gradient boosting to efficiently classify IL-13-inducing peptides. We report improved AUC, and MCC scores of 0.83 and 0.33 on validation data as compared to the current method.
CONCLUSIONS: Extensive benchmarking experiments suggest that the proposed method (iIL13Pred) could provide improved performance metrics in terms of sensitivity, specificity, accuracy, the area under the curve - receiver operating characteristics (AUCROC) and Matthews correlation coefficient (MCC) than the existing state-of-the-art approach (IL13Pred) on the validation dataset and an external dataset comprising of experimentally validated IL-13-inducing peptides. Additionally, the experiments were performed with an increased number of experimentally validated training datasets to obtain a more robust model. A user-friendly web server ( www.soodlab.com/iil13pred ) is also designed to facilitate rapid screening of IL-13-inducing peptides.
摘要:
背景:炎症介质在包括2019年新型冠状病毒病(COVID-19)在内的几种疾病中造成严重破坏,并且通常与疾病的严重程度相关。白细胞介素-13(IL-13),是一种多效性细胞因子,已知与哮喘和反应性气道疾病的气道炎症有关,在肿瘤和自身免疫性疾病中。有趣的是,最近IL-13与COVID-19严重程度的关联引发了人们对该细胞因子的兴趣.因此,可以调节IL-13诱导的新分子的表征可能导致新的治疗剂。
结果:这里,我们提出了IL-13诱导肽的改进预测。从最近的研究(IL13Pred)获得阳性和阴性数据集,并且使用Pfeature算法来计算肽的特征。与使用基于正则化的特征选择技术(具有L1惩罚的线性支持向量分类器)的最新技术相比,我们使用多变量特征选择技术(最小冗余最大相关性)来获得非冗余和高度相关的特征.在拟议的研究(改进的IL-13预测(iIL13Pred))中,mRMR特征选择方法的使用有助于选择具有改善性能的IL-13诱导肽的最具歧视性的特征。我们研究了七种常见的机器学习分类器,包括决策树,高斯朴素贝叶斯,k-最近的邻居,Logistic回归,支持向量机,随机森林,和极端梯度增强以有效地对IL-13诱导肽进行分类。我们报告AUC改善,与当前方法相比,验证数据的MCC评分为0.83和0.33。
结论:广泛的基准测试实验表明,所提出的方法(iIL13Pred)可以在灵敏度方面提供改进的性能指标,特异性,准确度,曲线下面积-受试者工作特征(AUCROC)和马修斯相关系数(MCC)比现有的最新方法(IL13Pred)在验证数据集和包含实验验证的IL-13诱导肽的外部数据集上.此外,通过增加实验验证的训练数据集进行实验,以获得更稳健的模型。用户友好的Web服务器(www.soodlab.com/iil13pred)还旨在促进快速筛选IL-13诱导肽。
公众号