关键词: ADASYN Burn depression checklist Class balancing Depression prediction Feature group partitioning Machine learning Oversampling SMOTE Stratified cross validation

Mesh : Humans Machine Learning Depression / diagnosis Algorithms Severity of Illness Index Sensitivity and Specificity Female

来  源:   DOI:10.1186/s12874-024-02249-8   PDF(Pubmed)

Abstract:
In contemporary society, depression has emerged as a prominent mental disorder that exhibits exponential growth and exerts a substantial influence on premature mortality. Although numerous research applied machine learning methods to forecast signs of depression. Nevertheless, only a limited number of research have taken into account the severity level as a multiclass variable. Besides, maintaining the equality of data distribution among all the classes rarely happens in practical communities. So, the inevitable class imbalance for multiple variables is considered a substantial challenge in this domain. Furthermore, this research emphasizes the significance of addressing class imbalance issues in the context of multiple classes. We introduced a new approach Feature group partitioning (FGP) in the data preprocessing phase which effectively reduces the dimensionality of features to a minimum. This study utilized synthetic oversampling techniques, specifically Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic (ADASYN), for class balancing. The dataset used in this research was collected from university students by administering the Burn Depression Checklist (BDC). For methodological modifications, we implemented heterogeneous ensemble learning stacking, homogeneous ensemble bagging, and five distinct supervised machine learning algorithms. The issue of overfitting was mitigated by evaluating the accuracy of the training, validation, and testing datasets. To justify the effectiveness of the prediction models, balanced accuracy, sensitivity, specificity, precision, and f1-score indices are used. Overall, comprehensive analysis demonstrates the discrimination between the Conventional Depression Screening (CDS) and FGP approach. In summary, the results show that the stacking classifier for FGP with SMOTE approach yields the highest balanced accuracy, with a rate of 92.81%. The empirical evidence has demonstrated that the FGP approach, when combined with the SMOTE, able to produce better performance in predicting the severity of depression. Most importantly the optimization of the training time of the FGP approach for all of the classifiers is a significant achievement of this research.
摘要:
在当代社会,抑郁症已成为一种突出的精神障碍,表现出指数增长,并对过早死亡产生重大影响。尽管许多研究应用机器学习方法来预测抑郁症的迹象。然而,只有有限数量的研究将严重性级别作为多类变量考虑在内.此外,在实际社区中,保持所有类之间数据分布的平等很少发生。所以,多个变量不可避免的类不平衡被认为是该领域的重大挑战。此外,这项研究强调了在多班级背景下解决班级不平衡问题的重要性。我们在数据预处理阶段引入了一种新的特征组划分(FGP)方法,该方法有效地将特征的维度降至最低。这项研究利用了合成过采样技术,特别是合成少数过采样技术(SMOTE)和自适应合成(ADASYN),类平衡。本研究中使用的数据集是通过管理烧伤抑郁症清单(BDC)从大学生那里收集的。对于方法上的修改,我们实现了异构集成学习堆叠,均匀合奏装袋,和五种不同的监督机器学习算法。通过评估训练的准确性,缓解了过拟合的问题,验证,和测试数据集。为了证明预测模型的有效性,平衡精度,灵敏度,特异性,精度,并使用f1分数指数。总的来说,综合分析证明了传统抑郁症筛查(CDS)和FGP方法之间的区别。总之,结果表明,采用SMOTE方法的FGP堆叠分类器具有最高的平衡精度,率92.81%。经验证据表明,FGP方法,当与SMOTE结合时,能够在预测抑郁症的严重程度方面产生更好的表现。最重要的是,优化所有分类器的FGP方法的训练时间是本研究的一项重大成就。
公众号