关键词: clinical decision rules machine learning nutrition surveys public health vitamin D deficiency

Mesh : Humans Child, Preschool Nutrition Surveys Machine Learning Pandemics Vitamin D Deficiency / epidemiology

来  源:   DOI:10.3389/fendo.2024.1327058   PDF(Pubmed)

Abstract:
Vitamin D deficiency is strongly associated with the development of several diseases. In the current context of a global pandemic of vitamin D deficiency, it is critical to identify people at high risk of vitamin D deficiency. There are no prediction tools for predicting the risk of vitamin D deficiency in the general community population, and this study aims to use machine learning to predict the risk of vitamin D deficiency using data that can be obtained through simple interviews in the community.
The National Health and Nutrition Examination Survey 2001-2018 dataset is used for the analysis which is randomly divided into training and validation sets in the ratio of 70:30. GBM, LR, NNet, RF, SVM, XGBoost methods are used to construct the models and their performance is evaluated. The best performed model was interpreted using the SHAP value and further development of the online web calculator.
There were 62,919 participants enrolled in the study, and all participants included in the study were 2 years old and above, of which 20,204 (32.1%) participants had vitamin D deficiency. The models constructed by each method were evaluated using AUC as the primary evaluation statistic and ACC, PPV, NPV, SEN, SPE, F1 score, MCC, Kappa, and Brier score as secondary evaluation statistics. Finally, the XGBoost-based model has the best and near-perfect performance. The summary plot of SHAP values shows that the top three important features for this model are race, age, and BMI. An online web calculator based on this model can easily and quickly predict the risk of vitamin D deficiency.
In this study, the XGBoost-based prediction tool performs flawlessly and is highly accurate in predicting the risk of vitamin D deficiency in community populations.
摘要:
维生素D缺乏与几种疾病的发展密切相关。在维生素D缺乏症全球流行的背景下,确定维生素D缺乏高危人群至关重要.没有预测工具可以预测普通社区人群中维生素D缺乏的风险,这项研究旨在使用机器学习来预测维生素D缺乏的风险,这些数据可以通过社区中的简单访谈获得。
2001-2018年国家健康和营养检查调查数据集用于分析,该数据集以70:30的比例随机分为训练集和验证集。GBM,LR,NNet,射频,SVM,使用XGBoost方法构建模型并评估其性能。使用SHAP值和在线网络计算器的进一步开发解释了性能最佳的模型。
有62,919名参与者参加了这项研究,纳入研究的所有参与者均为2岁及以上,其中20,204名(32.1%)参与者患有维生素D缺乏症。以AUC为主要评价统计量,以ACC,PPV,NPV,SEN,SPE,F1得分,MCC,Kappa,和Brier得分作为二级评价统计。最后,基于XGBoost的模型具有最佳和近乎完美的性能。SHAP值的汇总图显示,该模型的前三个重要特征是种族,年龄,BMI。基于此模型的在线网络计算器可以轻松快速地预测维生素D缺乏的风险。
在这项研究中,基于XGBoost的预测工具在预测社区人群中维生素D缺乏的风险方面表现完美且非常准确.
公众号