关键词: Asthma Disease risk prediction model Ensemble methods GWAS Genome-wide association study KoGES Korean Genome and Epidemiology Study Large-scale genetic data Machine learning methods Oversampling Penalized methods

Mesh : Humans Genome-Wide Association Study Bayes Theorem Algorithms Machine Learning Republic of Korea / epidemiology

来  源:   DOI:10.1186/s12859-024-05677-x   PDF(Pubmed)

Abstract:
BACKGROUND: Genome-wide association studies have successfully identified genetic variants associated with human disease. Various statistical approaches based on penalized and machine learning methods have recently been proposed for disease prediction. In this study, we evaluated the performance of several such methods for predicting asthma using the Korean Chip (KORV1.1) from the Korean Genome and Epidemiology Study (KoGES).
RESULTS: First, single-nucleotide polymorphisms were selected via single-variant tests using logistic regression with the adjustment of several epidemiological factors. Next, we evaluated the following methods for disease prediction: ridge, least absolute shrinkage and selection operator, elastic net, smoothly clipped absolute deviation, support vector machine, random forest, boosting, bagging, naïve Bayes, and k-nearest neighbor. Finally, we compared their predictive performance based on the area under the curve of the receiver operating characteristic curves, precision, recall, F1-score, Cohen\'s Kappa, balanced accuracy, error rate, Matthews correlation coefficient, and area under the precision-recall curve. Additionally, three oversampling algorithms are used to deal with imbalance problems.
CONCLUSIONS: Our results show that penalized methods exhibit better predictive performance for asthma than that achieved via machine learning methods. On the other hand, in the oversampling study, randomforest and boosting methods overall showed better prediction performance than penalized methods.
摘要:
背景:全基因组关联研究已成功鉴定出与人类疾病相关的遗传变异。最近已经提出了基于惩罚和机器学习方法的各种统计方法用于疾病预测。在这项研究中,我们使用韩国基因组和流行病学研究(KoGES)的韩国芯片(KORV1.1)评估了几种此类方法预测哮喘的性能.
结果:首先,通过单变异检测,采用logistic回归分析并调整了几个流行病学因素,筛选出单核苷酸多态性.接下来,我们评估了以下疾病预测方法:里奇,最小绝对收缩和选择运算符,弹性网,平滑地削减绝对偏差,支持向量机,随机森林,升压,装袋,天真贝叶斯,和k最近的邻居。最后,我们根据接收器工作特性曲线的曲线下面积比较了它们的预测性能,精度,召回,F1分数,Cohen\'sKappa,平衡精度,错误率,马修斯相关系数,和精确召回率曲线下的面积。此外,三种过采样算法用于处理不平衡问题。
结论:我们的结果表明,与通过机器学习方法相比,惩罚方法对哮喘表现出更好的预测性能。另一方面,在过抽样研究中,随机森林和增强方法总体上显示出比惩罚方法更好的预测性能。
公众号