关键词: Parkinson’s disease SHAP value machine learning polygenic risk scores risk prediction model

Mesh : Humans Parkinson Disease / diagnosis genetics Algorithms Genetic Risk Score Hospitalization Machine Learning

来  源:   DOI:10.3390/biom13121761   PDF(Pubmed)

Abstract:
The detection of Parkinson\'s disease (PD) in its early stages is of great importance for its treatment and management, but consensus is lacking on what information is necessary and what models should be used to best predict PD risk. In our study, we first grouped PD-associated factors based on their cost and accessibility, and then gradually incorporated them into risk predictions, which were built using eight commonly used machine learning models to allow for comprehensive assessment. Finally, the Shapley Additive Explanations (SHAP) method was used to investigate the contributions of each factor. We found that models built with demographic variables, hospital admission examinations, clinical assessment, and polygenic risk score achieved the best prediction performance, and the inclusion of invasive biomarkers could not further enhance its accuracy. Among the eight machine learning models considered, penalized logistic regression and XGBoost were the most accurate algorithms for assessing PD risk, with penalized logistic regression achieving an area under the curve of 0.94 and a Brier score of 0.08. Olfactory function and polygenic risk scores were the most important predictors for PD risk. Our research has offered a practical framework for PD risk assessment, where necessary information and efficient machine learning tools were highlighted.
摘要:
帕金森病(PD)的早期检测对其治疗和管理具有重要意义。但对于哪些信息是必要的,以及应该使用哪些模型来最好地预测PD风险缺乏共识.在我们的研究中,我们首先根据PD相关因素的成本和可及性对其进行分组,然后逐渐将它们纳入风险预测中,它们是使用八种常用的机器学习模型构建的,以便进行全面评估。最后,Shapley加性解释(SHAP)方法用于研究各因素的贡献。我们发现用人口统计学变量建立的模型,入院检查,临床评估,多基因风险评分达到最佳预测性能,并且包含侵入性生物标志物不能进一步提高其准确性。在所考虑的八种机器学习模型中,惩罚逻辑回归和XGBoost是评估PD风险的最准确算法,惩罚逻辑回归曲线下面积为0.94,Brier得分为0.08。嗅觉功能和多基因风险评分是PD风险的最重要预测因子。我们的研究为PD风险评估提供了一个实用的框架,其中强调了必要的信息和高效的机器学习工具。
公众号