关键词: Alzheimer’s disease Interpretability analysis Machine learning Prediction model

Mesh : Humans Alzheimer Disease / diagnosis Biomarkers Neuroimaging / methods Machine Learning

来  源:   DOI:10.1159/000531819

Abstract:
This study aimed to develop novel machine learning models for predicting Alzheimer\'s disease (AD) and identify key factors for targeted prevention.
We included 1,219, 863, and 482 participants aged 60+ years with only sociodemographic, both sociodemographic and self-reported health, both the former two and blood biomarkers information from Alzheimer\'s Disease Neuroimaging Initiative (ADNI) database. Machine learning models were constructed for predicting the risk of AD for the above three populations. Model performance was evaluated by discrimination, calibration, and clinical usefulness. SHapley Additive exPlanation (SHAP) was applied to identify key predictors of optimal models.
The mean age was 73.49, 74.52, and 74.29 years for the three populations, respectively. Models with sociodemographic information and models with both sociodemographic and self-reported health information showed modest performance. For models with sociodemographic, self-reported health, and blood biomarker information, their overall performance improved substantially, specifically, logistic regression performed best, with an AUC value of 0.818. Blood biomarkers of ptau protein and plasma neurofilament light, age, blood tau protein, and education level were top five significant predictors. In addition, taurine, inosine, xanthine, marital status, and L.Glutamine also showed importance to AD prediction.
Interpretable machine learning showed promise in screening high-risk AD individual and could further identify key predictors for targeted prevention.
摘要:
背景:本研究旨在开发用于预测阿尔茨海默病(AD)的新型机器学习模型,并确定针对性预防的关键因素。
方法:我们包括1219、863和482名60岁以上的参与者,只有社会人口统计学,社会人口统计学和自我报告的健康,前两种和血液生物标志物信息来自阿尔茨海默病神经影像学倡议(ADNI)数据库。构建机器学习模型来预测上述三个人群的AD风险。模型性能是通过区别对待来评估的,校准,和临床有用性。Shapley加性解释(SHAP)用于确定最佳模型的关键预测因子。
结果:这三个人群的平均年龄分别为73.49、74.52和74.29岁,分别。具有社会人口统计学信息的模型和具有社会人口统计学和自我报告的健康信息的模型表现出适度的表现。对于具有社会人口统计学和自我报告健康状况的模型,和血液生物标志物信息,他们的整体表现大大提高,具体来说,LR表现最好,AUC值为0.818。ptau蛋白和血浆神经丝光的血液生物标志物,年龄,血tau蛋白和教育水平是前五名显著预测因子.此外,牛磺酸,肌苷,黄嘌呤,婚姻状况,L.谷氨酰胺对AD的预测也很重要。
结论:可解释的机器学习在筛查高危AD个体方面显示出希望,并可以进一步确定有针对性的预防的关键预测因素。
公众号