关键词: Alzheimer disease and related dementias graph neural network machine learning relation importance risk prediction

Mesh : Humans Alzheimer Disease / diagnosis Neural Networks, Computer Risk Assessment / methods Algorithms Female Aged Male Dementia / epidemiology diagnosis Machine Learning Risk Factors

来  源:   DOI:10.2196/54748   PDF(Pubmed)

Abstract:
BACKGROUND: Alzheimer disease and related dementias (ADRD) rank as the sixth leading cause of death in the United States, underlining the importance of accurate ADRD risk prediction. While recent advancements in ADRD risk prediction have primarily relied on imaging analysis, not all patients undergo medical imaging before an ADRD diagnosis. Merging machine learning with claims data can reveal additional risk factors and uncover interconnections among diverse medical codes.
OBJECTIVE: The study aims to use graph neural networks (GNNs) with claim data for ADRD risk prediction. Addressing the lack of human-interpretable reasons behind these predictions, we introduce an innovative, self-explainable method to evaluate relationship importance and its influence on ADRD risk prediction.
METHODS: We used a variationally regularized encoder-decoder GNN (variational GNN [VGNN]) integrated with our proposed relation importance method for estimating ADRD likelihood. This self-explainable method can provide a feature-important explanation in the context of ADRD risk prediction, leveraging relational information within a graph. Three scenarios with 1-year, 2-year, and 3-year prediction windows were created to assess the model\'s efficiency, respectively. Random forest (RF) and light gradient boost machine (LGBM) were used as baselines. By using this method, we further clarify the key relationships for ADRD risk prediction.
RESULTS: In scenario 1, the VGNN model showed area under the receiver operating characteristic (AUROC) scores of 0.7272 and 0.7480 for the small subset and the matched cohort data set. It outperforms RF and LGBM by 10.6% and 9.1%, respectively, on average. In scenario 2, it achieved AUROC scores of 0.7125 and 0.7281, surpassing the other models by 10.5% and 8.9%, respectively. Similarly, in scenario 3, AUROC scores of 0.7001 and 0.7187 were obtained, exceeding 10.1% and 8.5% than the baseline models, respectively. These results clearly demonstrate the significant superiority of the graph-based approach over the tree-based models (RF and LGBM) in predicting ADRD. Furthermore, the integration of the VGNN model and our relation importance interpretation could provide valuable insight into paired factors that may contribute to or delay ADRD progression.
CONCLUSIONS: Using our innovative self-explainable method with claims data enhances ADRD risk prediction and provides insights into the impact of interconnected medical code relationships. This methodology not only enables ADRD risk modeling but also shows potential for other image analysis predictions using claims data.
摘要:
背景:阿尔茨海默病和相关痴呆(ADRD)是美国第六大死亡原因,强调准确预测ADRD风险的重要性。虽然ADRD风险预测的最新进展主要依赖于成像分析,并非所有患者在ADRD诊断前都接受影像学检查.将机器学习与索赔数据合并可以揭示其他风险因素,并揭示不同医疗代码之间的相互联系。
目的:该研究旨在使用带有索赔数据的图神经网络(GNN)进行ADRD风险预测。解决这些预测背后缺乏人类可解释的原因,我们介绍一个创新的,关系重要性评估及其对ADRD风险预测的影响的自我解释方法。
方法:我们使用了与我们提出的关系重要性方法集成的可变正则化编码器-解码器GNN(变分GNN[VGNN])来估计ADRD似然。这种可自我解释的方法可以在ADRD风险预测的背景下提供一个特征重要的解释,利用图中的关系信息。1年的三种情况,2年,并创建了3年预测窗口来评估模型的效率,分别。随机森林(RF)和光梯度增强机(LGBM)用作基线。通过使用此方法,我们进一步阐明了ADRD风险预测的关键关系。
结果:在方案1中,VGNN模型显示,对于小子集和匹配的队列数据集,接收器操作特征(AUROC)评分为0.7272和0.7480。它的表现优于RF和LGBM10.6%和9.1%,分别,平均而言。在方案2中,它获得了0.7125和0.7281的AUROC分数,分别超过其他模型的10.5%和8.9%,分别。同样,在情景3中,获得了0.7001和0.7187的AUROC评分,超过基线模型的10.1%和8.5%,分别。这些结果清楚地表明了基于图的方法在预测ADRD方面优于基于树的模型(RF和LGBM)的显着优势。此外,VGNN模型的整合和我们的关系重要性解释可以为可能导致或延迟ADRD进展的配对因素提供有价值的见解.
结论:使用我们创新的自我解释方法和索赔数据可增强ADRD风险预测,并提供对相互关联的医疗代码关系影响的见解。这种方法不仅可以进行ADRD风险建模,而且还显示了使用索赔数据进行其他图像分析预测的潜力。
公众号