基于语音预训练模型的抑郁症识别Depression recognition using voice-based pre-training model.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an artificial intelligence method to effectively identify depression. The wav2vec 2.0 voice-based pre-training model was used as a feature extractor to automatically extract high-quality voice features from raw audio. Additionally, a small fine-tuning network was used as a classification model to output depression classification results. Subsequently, the proposed model was fine-tuned on the DAIC-WOZ dataset and achieved excellent classification results. Notably, the model demonstrated outstanding performance in binary classification, attaining an accuracy of 0.9649 and an RMSE of 0.1875 on the test set. Similarly, impressive results were obtained in multi-classification, with an accuracy of 0.9481 and an RMSE of 0.3810. The wav2vec 2.0 model was first used for depression recognition and showed strong generalization ability. The method is simple, practical, and applicable, which can assist doctors in the early screening of depression.

摘要：

抑郁症的早期筛查有利于患者获得更好的诊断和治疗。虽然已经证明了利用语音数据进行抑郁症检测的有效性，数据集大小不足的问题仍未解决。因此,提出了一种有效识别抑郁症的人工智能方法。基于wav2vec2.0语音的预训练模型被用作特征提取器，从原始音频中自动提取高质量的语音特征。此外,使用小型微调网络作为分类模型,输出抑郁分类结果.随后，所提出的模型在DAIC-WOZ数据集上进行了微调,取得了优异的分类结果.值得注意的是,该模型在二元分类方面表现突出，在测试装置上达到0.9649的精度和0.1875的RMSE。同样，在多分类中获得了令人印象深刻的结果，精度为0.9481，RMSE为0.3810。Wav2vec2.0模型首次用于抑郁症识别，并表现出较强的泛化能力。方法简单,实用,并且适用，这可以帮助医生早期筛查抑郁症。