使用语音样本的深度学习在抑郁症中的诊断准确性：系统评价和荟萃分析。Diagnostic accuracy of deep learning using speech samples in depression: a systematic review and meta-analysis.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

OBJECTIVE: This study aims to conduct a systematic review and meta-analysis of the diagnostic accuracy of deep learning (DL) using speech samples in depression.
METHODS: This review included studies reporting diagnostic results of DL algorithms in depression using speech data, published from inception to January 31, 2024, on PubMed, Medline, Embase, PsycINFO, Scopus, IEEE, and Web of Science databases. Pooled accuracy, sensitivity, and specificity were obtained by random-effect models. The diagnostic Precision Study Quality Assessment Tool (QUADAS-2) was used to assess the risk of bias.
RESULTS: A total of 25 studies met the inclusion criteria and 8 of them were used in the meta-analysis. The pooled estimates of accuracy, specificity, and sensitivity for depression detection models were 0.87 (95% CI, 0.81-0.93), 0.85 (95% CI, 0.78-0.91), and 0.82 (95% CI, 0.71-0.94), respectively. When stratified by model structure, the highest pooled diagnostic accuracy was 0.89 (95% CI, 0.81-0.97) in the handcrafted group.
CONCLUSIONS: To our knowledge, our study is the first meta-analysis on the diagnostic performance of DL for depression detection from speech samples. All studies included in the meta-analysis used convolutional neural network (CNN) models, posing problems in deciphering the performance of other DL algorithms. The handcrafted model performed better than the end-to-end model in speech depression detection.
CONCLUSIONS: The application of DL in speech provided a useful tool for depression detection. CNN models with handcrafted acoustic features could help to improve the diagnostic performance.
BACKGROUND: The study protocol was registered on PROSPERO (CRD42023423603).

摘要：

目的：本研究旨在对抑郁症中使用语音样本进行深度学习（DL）的诊断准确性进行系统综述和荟萃分析。
方法：本综述包括报告使用语音数据对抑郁症的DL算法诊断结果的研究，从成立到2024年1月31日，在PubMed上发表，Medline,Embase,PsycINFO,Scopus,IEEE,和WebofScience数据库。汇集精度,灵敏度,和特异性通过随机效应模型获得。诊断精度研究质量评估工具（QUADAS-2）用于评估偏倚风险。
结果：共有25项研究符合纳入标准，其中8项用于荟萃分析。对准确性的汇总估计，特异性，抑郁检测模型的敏感性为0.87(95%CI，0.81-0.93)，0.85（95%CI，0.78-0.91），和0.82（95%CI，0.71-0.94），分别。按模型结构分层时，手工制作组的合并诊断准确率最高为0.89(95%CI,0.81~0.97).
结论：据我们所知，我们的研究是关于从语音样本中检测抑郁症的DL诊断性能的首次荟萃分析.荟萃分析中包含的所有研究都使用卷积神经网络(CNN)模型，在解密其他DL算法的性能方面存在问题。手工制作的模型在语音抑郁检测中的性能优于端到端模型。
结论：DL在语音中的应用为抑郁症检测提供了有用的工具。具有手工制作的声学特征的CNN模型可以帮助提高诊断性能。
背景：研究方案已在PROSPERO（CRD42023423603）上注册。