■抑郁症是世界范围内普遍存在的疾病,具有潜在的严重影响。它显著有助于增加与多种危险因素相关的疾病的风险。抑郁症状的早期准确诊断是管理的关键第一步,干预,和预防。各种营养和膳食化合物已被建议参与发病,维护,和抑郁症的严重程度。尽管在更好地理解营养危险因素与抑郁症发生之间的关系方面存在挑战,通过监督机器学习评估这些标记的相互作用还有待充分探索。
■本研究旨在确定基于机器学习的决策支持方法使用韩国国家健康和营养调查的公开健康数据来识别抑郁症存在的能力。两种勘探技术,即,均匀流形逼近与投影和皮尔逊相关,进行了数据集之间的解释性分析。进行了具有交叉验证的网格搜索优化,以微调模型,以最高精度对抑郁症进行分类。几个绩效指标,包括准确性,精度,召回,F1得分,混淆矩阵,精确召回率和接收者工作特性曲线下的区域,和校准图,用于比较分类器性能。我们进一步调查了所提供特征的重要性:使用ELI5的可视化解释,部分依赖图,以及使用模型不可知解释和Shapley加性解释在人群和个体水平上的预测的局部可解释。
■最佳模型在原始数据集中实现了XGBoost的86.18%的准确度和随机森林模型的84.96%的曲线下面积,在基于分位数的数据集中实现了86.02%的准确度和85.34%的曲线下面积。可解释的结果揭示了对特征值相对变化的补充观察,and,因此,可以确定紧急抑郁症风险的重要性.
■我们方法的优势在于用于使用微调模型进行训练的大样本量。基于机器学习的分析表明,超调模型在对抑郁障碍患者进行分类时具有更高的准确性。一系列可解释的实验证明了这一点,可以成为疾病控制的有效解决方案。
UNASSIGNED: Depression is a prevalent disorder worldwide, with potentially severe implications. It contributes significantly to an increased risk of diseases associated with multiple risk factors. Early accurate diagnosis of depressive symptoms is a critical first step toward management, intervention, and prevention. Various nutritional and dietary compounds have been suggested to be involved in the onset, maintenance, and severity of depressive disorders. Despite the challenges to better understanding the association between nutritional risk factors and the occurrence of depression, assessing the interplay of these markers through supervised machine learning remains to be fully explored.
UNASSIGNED: This study aimed to determine the ability of machine learning-based decision support methods to identify the presence of depression using publicly available health data from the Korean National Health and Nutrition Examination Survey. Two exploration techniques, namely, uniform manifold approximation and projection and Pearson correlation, were performed for explanatory analysis among datasets. A grid search optimization with cross-validation was performed to fine-tune the models for classifying depression with the highest accuracy. Several performance measures, including accuracy, precision, recall, F1 score, confusion matrix, areas under the precision-recall and receiver operating characteristic curves, and calibration plot, were used to compare classifier performances. We further investigated the importance of the features provided: visualized interpretation using ELI5, partial dependence plots, and local interpretable using model-agnostic explanations and Shapley additive explanation for the prediction at both the population and individual levels.
UNASSIGNED: The best model achieved an accuracy of 86.18% for XGBoost and an area under the curve of 84.96% for the random forest model in original dataset and the XGBoost algorithm with an accuracy of 86.02% and an area under the curve of 85.34% in the quantile-based dataset. The explainable results revealed a complementary observation of the relative changes in feature values, and, thus, the importance of emergent depression risks could be identified.
UNASSIGNED: The strength of our approach is the large sample size used for training with a fine-tuned model. The machine learning-based analysis showed that the hyper-tuned model has empirically higher accuracy in classifying patients with depressive disorder, as evidenced by the set of interpretable experiments, and can be an effective solution for disease control.