关键词: Asthma genetics machine learning prediction single-nucleotide variants

来  源:   DOI:10.1016/j.jacig.2024.100282   PDF(Pubmed)

Abstract:
UNASSIGNED: Asthma is a chronic inflammatory disease of the airways that is heterogeneous and multifactorial, making its accurate characterization a complex process. Therefore, identifying the genetic variations associated with asthma and discovering the molecular interactions between the omics that confer risk of developing this disease will help us to unravel the biological pathways involved in its pathogenesis.
UNASSIGNED: We sought to develop a predictive genetic panel for asthma using machine learning methods.
UNASSIGNED: We tested 3 variable selection methods: Boruta\'s algorithm, the top 200 genome-wide association study markers according to their respective P values, and an elastic net regression. Ten different algorithms were chosen for the classification tests. A predictive panel was built on the basis of joint scores between the classification algorithms.
UNASSIGNED: Two variable selection methods, Boruta and genome-wide association studies, were statistically similar in terms of the average accuracies generated, whereas elastic net had the worst overall performance. The predictive genetic panel was completed with 155 single-nucleotide variants, with 91.18% accuracy, 92.75% sensitivity, and 89.55% specificity using the support vector machine algorithm. The markers used range from known single-nucleotide variants to those not previously described in the literature. Our study shows potential in creating genetic prediction panels with tailored penalties per marker, aiding in the identification of optimal machine learning methods for intricate results.
UNASSIGNED: This method is able to classify asthma and nonasthma effectively, proving its potential utility in clinical prediction and diagnosis.
摘要:
哮喘是一种气道的慢性炎症性疾病,是异质性和多因素的,使其准确表征成为一个复杂的过程。因此,识别与哮喘相关的遗传变异,发现赋予发病风险的组学之间的分子相互作用,将有助于我们解开与哮喘发病机制有关的生物学途径。
我们试图使用机器学习方法开发哮喘的预测遗传小组。
我们测试了3种变量选择方法:Boruta\的算法,根据其各自的P值,排名前200位的全基因组关联研究标记,和弹性净回归。选择了十种不同的算法进行分类测试。基于分类算法之间的联合得分建立了预测小组。
两种变量选择方法,Boruta和全基因组关联研究,在生成的平均准确度方面具有统计学上的相似性,而弹性网的整体表现最差。预测性遗传小组完成了155个单核苷酸变异,准确率为91.18%,灵敏度92.75%,和89.55%的特异性使用支持向量机算法。所用的标记物范围从已知的单核苷酸变体到文献中先前未描述的那些。我们的研究显示了创建遗传预测面板的潜力,每个标记都有量身定制的惩罚,帮助识别复杂结果的最佳机器学习方法。
该方法能够有效地对哮喘和非哮喘进行分类,证明其在临床预测和诊断中的潜在实用性。
公众号