Speech analysis

语音分析
  • 文章类型: Journal Article
    构音障碍,由肌肉无力或瘫痪引起的运动性言语障碍,严重影响语言清晰度和生活质量。这种情况在运动言语障碍中很普遍,如帕金森病(PD),非典型帕金森病,如进行性核上性麻痹(PSP),亨廷顿病(HD),和肌萎缩侧索硬化症(ALS)。提高清晰度不仅是对患者重要的结果,而且还可以作为临床研究和药物开发的终点发挥关键作用。这项研究验证了语音清晰度的数字度量,ki:SB-M清晰度分数,根据数字医学会(DiMe)V3框架,跨越各种运动性言语障碍和语言。
    该研究使用了四个数据集:健康对照(HCs)和PD患者,HD,PSP,和来自捷克的ALS,哥伦比亚人,德国人口。参与者的言语清晰度使用ki:SB-M清晰度评分进行评估,它是从自动语音识别(ASR)系统中导出的。具有ASR间可靠性和时间一致性的验证,与每种疾病的金标准临床构音障碍评分相关的分析验证,并在HCs和患者之间进行分组比较的临床验证。
    验证表明,ASR系统之间的评分者之间的可靠性良好到出色,并且具有良好的一致性。分析验证显示,在所有患者组和语言中,SB-M可懂度得分与已建立的语言障碍临床指标之间存在显着相关性。临床验证表明,病理组和健康对照组之间的清晰度评分存在显着差异,表示度量的辨别能力。
    ki:SB-M清晰度分数是可靠的,有效,和临床相关工具,用于评估运动性言语障碍的言语清晰度。它有望通过自动化改进临床试验,目标,和可扩展的评估。未来的研究应该探索其在监测疾病进展和治疗效果方面的效用,并增加来自进一步构音障碍的数据进行验证。
    UNASSIGNED: Dysarthria, a motor speech disorder caused by muscle weakness or paralysis, severely impacts speech intelligibility and quality of life. The condition is prevalent in motor speech disorders such as Parkinson\'s disease (PD), atypical parkinsonism such as progressive supranuclear palsy (PSP), Huntington\'s disease (HD), and amyotrophic lateral sclerosis (ALS). Improving intelligibility is not only an outcome that matters to patients but can also play a critical role as an endpoint in clinical research and drug development. This study validates a digital measure for speech intelligibility, the ki: SB-M intelligibility score, across various motor speech disorders and languages following the Digital Medicine Society (DiMe) V3 framework.
    UNASSIGNED: The study used four datasets: healthy controls (HCs) and patients with PD, HD, PSP, and ALS from Czech, Colombian, and German populations. Participants\' speech intelligibility was assessed using the ki: SB-M intelligibility score, which is derived from automatic speech recognition (ASR) systems. Verification with inter-ASR reliability and temporal consistency, analytical validation with correlations to gold standard clinical dysarthria scores in each disease, and clinical validation with group comparisons between HCs and patients were performed.
    UNASSIGNED: Verification showed good to excellent inter-rater reliability between ASR systems and fair to good consistency. Analytical validation revealed significant correlations between the SB-M intelligibility score and established clinical measures for speech impairments across all patient groups and languages. Clinical validation demonstrated significant differences in intelligibility scores between pathological groups and healthy controls, indicating the measure\'s discriminative capability.
    UNASSIGNED: The ki: SB-M intelligibility score is a reliable, valid, and clinically relevant tool for assessing speech intelligibility in motor speech disorders. It holds promise for improving clinical trials through automated, objective, and scalable assessments. Future studies should explore its utility in monitoring disease progression and therapeutic efficacy as well as add data from further dysarthrias to the validation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:人类健康是一个复杂的,动态概念包含一系列受遗传影响的状态,环境,生理,和心理因素。中医将健康分为九种身体体质类型,每个都反映了生命能量的独特平衡或不平衡,影响身体,心理,和情绪状态。机器学习模型的进步为诊断阿尔茨海默氏症等疾病提供了有希望的途径,痴呆症,和呼吸系统疾病通过分析语音模式,实现互补的非侵入性疾病诊断。该研究旨在使用语音音频来识别以不平衡体质类型为特征的亚健康人群。
    方法:参与者,年龄在18-45岁之间,从声学健康研究中选出。使用ATR2500X-USB麦克风和Praat软件收集音频记录。排除标准包括最近的疾病,牙科问题,和特定的病史。将音频数据预处理为Mel频率倒谱系数(MFCC)以用于模型训练。三种深度学习模型-一维卷积网络(Conv1D),二维卷积网络(Conv2D),和长短期记忆(LSTM)-使用Python实现对健康状况进行分类。生成显著性图以提供模型可解释性。
    结果:该研究使用了来自平衡(健康)的1,378条记录和来自不平衡(亚健康)类型的1,413条记录。Conv1D模型的训练准确率为91.91%,验证准确率为84.19%。Conv2D模型的训练准确率为96.19%,验证准确率为84.93%。LSTM模型显示92.79%的训练准确率和87.13%的验证准确率,有过度拟合的早期迹象。AUC评分分别为0.92和0.94(Conv1D),0.99(Conv2D),和0.97(LSTM)。所有型号都表现出强大的性能,Conv2D擅长辨别精度。
    结论:使用体质类型对健康状况的人类语音音频进行深度学习分类,在Conv1D中显示了有希望的结果,Conv2D,和LSTM模型。ROC曲线分析,训练精度,验证准确性表明,所有模型在平衡和不平衡的体质类型之间都有很强的区别。Conv2D具有良好的精度,虽然Conv1D和LSTM也表现良好,确认其可靠性。该研究整合了体质理论和深度学习技术,使用非侵入性方法对亚健康人群进行分类,从而促进个性化医疗和早期干预策略。
    BACKGROUND: Human health is a complex, dynamic concept encompassing a spectrum of states influenced by genetic, environmental, physiological, and psychological factors. Traditional Chinese Medicine categorizes health into nine body constitutional types, each reflecting unique balances or imbalances in vital energies, influencing physical, mental, and emotional states. Advances in machine learning models offer promising avenues for diagnosing conditions like Alzheimer\'s, dementia, and respiratory diseases by analyzing speech patterns, enabling complementary non-invasive disease diagnosis. The study aims to use speech audio to identify subhealth populations characterized by unbalanced constitution types.
    METHODS: Participants, aged 18-45, were selected from the Acoustic Study of Health. Audio recordings were collected using ATR2500X-USB microphones and Praat software. Exclusion criteria included recent illness, dental issues, and specific medical histories. The audio data were preprocessed to Mel-frequency cepstral coefficients (MFCCs) for model training. Three deep learning models-1-Dimensional Convolution Network (Conv1D), 2-Dimensional Convolution Network (Conv2D), and Long Short-Term Memory (LSTM)-were implemented using Python to classify health status. Saliency maps were generated to provide model explainability.
    RESULTS: The study used 1,378 recordings from balanced (healthy) and 1,413 from unbalanced (subhealth) types. The Conv1D model achieved a training accuracy of 91.91% and validation accuracy of 84.19%. The Conv2D model had 96.19% training accuracy and 84.93% validation accuracy. The LSTM model showed 92.79% training accuracy and 87.13% validation accuracy, with early signs of overfitting. AUC scores were 0.92 and 0.94 (Conv1D), 0.99 (Conv2D), and 0.97 (LSTM). All models demonstrated robust performance, with Conv2D excelling in discrimination accuracy.
    CONCLUSIONS: The deep learning classification of human speech audio for health status using body constitution types showed promising results with Conv1D, Conv2D, and LSTM models. Analysis of ROC curves, training accuracy, and validation accuracy showed all models robustly distinguished between balanced and unbalanced constitution types. Conv2D excelled with good accuracy, while Conv1D and LSTM also performed well, affirming their reliability. The study integrates constitution theory and deep learning technologies to classify subhealth populations using noninvasive approach, thereby promoting personalized medicine and early intervention strategies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    痴呆是包括阿尔茨海默病在内的几种进行性神经退行性疾病的总称。及时准确的检测对于早期干预至关重要。人工智能的进步为使用机器学习来帮助早期检测提供了巨大的潜力。
    总结最先进的基于机器学习的痴呆症预测方法,专注于非侵入性方法,因为患者的负担较低。具体来说,步态和言语表现的分析可以通过具有成本效益的临床筛查方法为认知健康提供见解.
    按照PRISMA方案(系统评价和荟萃分析的首选报告项目)进行系统文献综述。搜索是在三个电子数据库(Scopus,WebofScience,和PubMed),以确定2017年至2022年之间发表的相关研究。共选择了40篇论文进行审查。
    最常用的机器学习方法是支持向量机,其次是深度学习。研究建议使用多模态方法,因为它们可以提供全面和更好的预测性能。深度学习在步态研究中的应用仍处于早期阶段,因为很少有研究应用它。此外,包括全身运动的特征有助于更好的分类精度。关于演讲研究,不同参数的组合(声学,语言学,认知测试)产生了更好的结果。
    评论强调了机器学习的潜力,尤其是非侵入性的方法,在痴呆症的早期预测中。手动和自动语音分析的可比预测精度表明,即将采用全自动的痴呆症检测方法。
    UNASSIGNED: Dementia is a general term for several progressive neurodegenerative disorders including Alzheimer\'s disease. Timely and accurate detection is crucial for early intervention. Advancements in artificial intelligence present significant potential for using machine learning to aid in early detection.
    UNASSIGNED: Summarize the state-of-the-art machine learning-based approaches for dementia prediction, focusing on non-invasive methods, as the burden on the patients is lower. Specifically, the analysis of gait and speech performance can offer insights into cognitive health through clinically cost-effective screening methods.
    UNASSIGNED: A systematic literature review was conducted following the PRISMA protocol (Preferred Reporting Items for Systematic Reviews and Meta-Analyses). The search was performed on three electronic databases (Scopus, Web of Science, and PubMed) to identify the relevant studies published between 2017 to 2022. A total of 40 papers were selected for review.
    UNASSIGNED: The most common machine learning methods employed were support vector machine followed by deep learning. Studies suggested the use of multimodal approaches as they can provide comprehensive and better prediction performance. Deep learning application in gait studies is still in the early stages as few studies have applied it. Moreover, including features of whole body movement contribute to better classification accuracy. Regarding speech studies, the combination of different parameters (acoustic, linguistic, cognitive testing) produced better results.
    UNASSIGNED: The review highlights the potential of machine learning, particularly non-invasive approaches, in the early prediction of dementia. The comparable prediction accuracies of manual and automatic speech analysis indicate an imminent fully automated approach for dementia detection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Editorial
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    言语障碍通常是帕金森病(PD)的主要指标之一,尽管在早期阶段并不明显。虽然以前的研究主要集中在二元PD检测,这项研究探索了使用深度学习模型将持续的元音记录自动分类为健康对照,轻度PD,或基于运动症状严重程度评分的重度PD。流行的卷积神经网络(CNN)架构,VGG和ResNet,以及视觉变压器,斯温,对分段语音数据的logmel谱图图像表示进行了微调。此外,该研究调查了音频段长度和特定元音声音对这些模型性能的影响。调查结果表明,实施更长的细分市场会产生更好的绩效。这些模型显示出很强的区分PD和健康受试者的能力,达到95%以上的精度。然而,可靠地区分轻度和重度PD病例仍然具有挑战性.VGG16以91.8%的准确度和最大的ROC曲线下面积实现了最佳的总体分类性能。此外,重点分析元音/u/可以进一步提高准确率96%。应用像Grad-CAM这样的可视化技术也突出了CNN模型如何专注于局部频谱图区域,而变压器则关注更广泛的模式。总的来说,这项工作显示了深度学习用于非侵入性筛查和从语音记录中监测PD进展的潜力,但是需要更大的多类别标记数据集来进一步改进严重性分类。
    Speech impairments often emerge as one of the primary indicators of Parkinson\'s disease (PD), albeit not readily apparent in its early stages. While previous studies focused predominantly on binary PD detection, this research explored the use of deep learning models to automatically classify sustained vowel recordings into healthy controls, mild PD, or severe PD based on motor symptom severity scores. Popular convolutional neural network (CNN) architectures, VGG and ResNet, as well as vision transformers, Swin, were fine-tuned on log mel spectrogram image representations of the segmented voice data. Furthermore, the research investigated the effects of audio segment lengths and specific vowel sounds on the performance of these models. The findings indicated that implementing longer segments yielded better performance. The models showed strong capability in distinguishing PD from healthy subjects, achieving over 95% precision. However, reliably discriminating between mild and severe PD cases remained challenging. The VGG16 achieved the best overall classification performance with 91.8% accuracy and the largest area under the ROC curve. Furthermore, focusing analysis on the vowel /u/ could further improve accuracy to 96%. Applying visualization techniques like Grad-CAM also highlighted how CNN models focused on localized spectrogram regions while transformers attended to more widespread patterns. Overall, this work showed the potential of deep learning for non-invasive screening and monitoring of PD progression from voice recordings, but larger multi-class labeled datasets are needed to further improve severity classification.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    帕金森病(PD)是一种以一系列运动和非运动症状为特征的神经退行性疾病。PD的显着非运动症状之一是存在声乐障碍,归因于喉和声带肌肉组织的神经控制的潜在病理生理变化。从这个角度来看,机器学习(ML)技术在语音信号分析中的集成为PD的检测和诊断做出了重要贡献。特别是,MEL频率倒谱系数(MFCC)和Gammatone频率倒谱系数(GTCC)都是语音和音频信号处理领域中常用的特征提取技术,可以在声乐障碍识别方面表现出巨大的潜力。本研究提出了一种通过ML应用于语音分析的早期检测PD的新方法,利用MFCC和GTCC。使用伦敦大学国王学院(MDVR-KCL)数据集的移动设备语音录音中包含的录音。这些录音是从健康个体和PD患者那里收集的,同时他们阅读一段文字,并在电话上自发交谈。特别是,关于自发对话任务的语音数据通过说话者二值化进行处理,一种根据说话者身份将音频流划分为同质片段的技术。应用于MFCCS和GTCC的ML使我们能够以92.3%的测试准确率对PD患者进行分类。这项研究进一步证明了使用手机作为非侵入性,用于早期检测PD的具有成本效益的工具,显著改善患者预后和生活质量。
    Parkinson\'s disease (PD) is a neurodegenerative disorder characterized by a range of motor and non-motor symptoms. One of the notable non-motor symptoms of PD is the presence of vocal disorders, attributed to the underlying pathophysiological changes in the neural control of the laryngeal and vocal tract musculature. From this perspective, the integration of machine learning (ML) techniques in the analysis of speech signals has significantly contributed to the detection and diagnosis of PD. Particularly, MEL Frequency Cepstral Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GTCCs) are both feature extraction techniques commonly used in the field of speech and audio signal processing that could exhibit great potential for vocal disorder identification. This study presents a novel approach to the early detection of PD through ML applied to speech analysis, leveraging both MFCCs and GTCCs. The recordings contained in the Mobile Device Voice Recordings at King\'s College London (MDVR-KCL) dataset were used. These recordings were collected from healthy individuals and PD patients while they read a passage and during a spontaneous conversation on the phone. Particularly, the speech data regarding the spontaneous dialogue task were processed through speaker diarization, a technique that partitions an audio stream into homogeneous segments according to speaker identity. The ML applied to MFCCS and GTCCs allowed us to classify PD patients with a test accuracy of 92.3%. This research further demonstrates the potential to employ mobile phones as a non-invasive, cost-effective tool for the early detection of PD, significantly improving patient prognosis and quality of life.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项研究评估了说话者相似性和样本长度对使用SpeechBrain工具包的自动说话者识别(ASR)系统性能的影响。该数据集包括来自20位男性同卵双胞胎演讲者的录音,他们进行了自发的对话和采访。绩效评估涉及比较同卵双胞胎,数据集中的所有扬声器(包括双胞胎对),和所有扬声器,不包括双对。语音样本,从5到30s不等,接受基于等错误率(EER)和对数成本-似然比(Cllr)的评估。结果突出了同卵双胞胎对ASR系统构成的重大挑战,导致整体说话人识别精度下降。此外,基于较长语音样本的分析优于使用较短样本的分析。随着样本量的增加,说话者内和说话者间相似性得分的标准偏差值降低,表明与较短的语音相比,在较长的语音范围内估计说话者相似/不相似水平的可变性降低。这项研究还发现了同卵双胞胎之间不同程度的相似性,某些配对对ASR系统提出了更大的挑战。这些结果与先前的研究一致,并在相关文献的背景下进行了讨论。
    This study assessed the influence of speaker similarity and sample length on the performance of an automatic speaker recognition (ASR) system utilizing the SpeechBrain toolkit. The dataset comprised recordings from 20 male identical twin speakers engaged in spontaneous dialogues and interviews. Performance evaluations involved comparing identical twins, all speakers in the dataset (including twin pairs), and all speakers excluding twin pairs. Speech samples, ranging from 5 to 30 s, underwent assessment based on equal error rates (EER) and Log cost-likelihood ratios (Cllr). Results highlight the substantial challenge posed by identical twins to the ASR system, leading to a decrease in overall speaker recognition accuracy. Furthermore, analyses based on longer speech samples outperformed those using shorter samples. As sample size increased, standard deviation values for both intra and inter-speaker similarity scores decreased, indicating reduced variability in estimating speaker similarity/dissimilarity levels in longer speech stretches compared to shorter ones. The study also uncovered varying degrees of likeness among identical twins, with certain pairs presenting a greater challenge for ASR systems. These outcomes align with prior research and are discussed within the context of relevant literature.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:暂时性意识丧失(TLOC)的临床决策工具可以减少目前较高的误诊率和等待专家评估的时间。大多数基于患者报告的症状清单的临床决策工具仅区分TLOC的三个最常见原因中的两个(癫痫,功能性/分离性癫痫发作,和晕厥)或与癫痫和FDS之间特别具有挑战性的区别作斗争。基于先前的研究描述了癫痫发作和FDS发作的口头描述的差异,本研究探讨了通过对患者报告症状的自动分析和TLOC口头描述相结合来预测TLOC病因的可行性.
    方法:参与者完成了一个在线Web应用程序,该应用程序包括34项病史和症状问卷(iPEP)以及与虚拟代理(VA)的口头互动,该虚拟代理询问了有关TLOC的最新经历的八个问题。使用特征的不同组合和嵌套的留一交叉验证来训练支持向量机(SVM)。iPEP提供了基准性能。受先前定性研究的启发,设计了三个基于口语的特征集,以评估:(1)制定工作,(2)来自不同语义类别的词的比例,和(3)动词,副词,和形容词用法。
    结果:76名参与者完成了申请(癫痫=24,FDS=36,晕厥=16)。只有61名参与者完成了VA交互(癫痫=20,FDS=29,晕厥=12)。iPEP模型准确预测了所有诊断的65.8%,但通过改善癫痫和FDS的鉴别诊断,语言特征的纳入将准确率提高到85.5%.
    结论:这些研究结果表明,使用在线Web应用程序和VA对收集的TLOC描述进行自动分析可以提高当前TLOC临床决策工具的准确性,并促进临床分层过程(例如确保适当转诊到心脏与神经系统研究和管理途径)。
    OBJECTIVE: A clinical decision tool for Transient Loss of Consciousness (TLOC) could reduce currently high misdiagnosis rates and waiting times for specialist assessments. Most clinical decision tools based on patient-reported symptom inventories only distinguish between two of the three most common causes of TLOC (epilepsy, functional /dissociative seizures, and syncope) or struggle with the particularly challenging differentiation between epilepsy and FDS. Based on previous research describing differences in spoken accounts of epileptic seizures and FDS seizures, this study explored the feasibility of predicting the cause of TLOC by combining the automated analysis of patient-reported symptoms and spoken TLOC descriptions.
    METHODS: Participants completed an online web application that consisted of a 34-item medical history and symptom questionnaire (iPEP) and spoken interaction with a virtual agent (VA) that asked eight questions about the most recent experience of TLOC. Support Vector Machines (SVM) were trained using different combinations of features and nested leave-one-out cross validation. The iPEP provided a baseline performance. Inspired by previous qualitative research three spoken language based feature sets were designed to assess: (1) formulation effort, (2) the proportion of words from different semantic categories, and (3) verb, adverb, and adjective usage.
    RESULTS: 76 participants completed the application (Epilepsy = 24, FDS = 36, syncope = 16). Only 61 participants also completed the VA interaction (Epilepsy = 20, FDS = 29, syncope = 12). The iPEP model accurately predicted 65.8 % of all diagnoses, but the inclusion of the language features increased the accuracy to 85.5 % by improving the differential diagnosis between epilepsy and FDS.
    CONCLUSIONS: These findings suggest that an automated analysis of TLOC descriptions collected using an online web application and VA could improve the accuracy of current clinical decisions tools for TLOC and facilitate clinical stratification processes (such as ensuring appropriate referral to cardiological versus neurological investigation and management pathways).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    COVID-19,一种由SARS-CoV-2病毒引起的传染病,世界卫生组织(WHO)于2020年3月宣布为大流行。截至2020年8月中旬,全球有超过2100万人检测呈阳性。感染迅速增长,正在为抗击这种疾病做出巨大努力。在本文中,我们试图利用数据科学将各种COVID-19研究活动系统化,我们广泛地定义数据科学,以涵盖各种方法和工具-包括来自人工智能(AI)的方法和工具,机器学习(ML)统计数据,建模,模拟,和数据可视化-可用于存储,process,并从数据中提取见解。除了回顾最近快速增长的研究机构,我们调查了可用于进一步跟踪COVID-19传播和缓解策略的公共数据集和存储库。作为其中的一部分,我们对在这短时间内发表的论文进行了文献计量分析。最后,基于这些见解,我们强调了在被调查作品中观察到的常见挑战和陷阱。我们还在https://github.com/Data-Science-and-COVID-19/利用数据-科学-战斗-COVID-19-A-全面审查上创建了一个实时资源存储库,我们打算随时更新最新资源,包括新论文和数据集。
    COVID-19, an infectious disease caused by the SARS-CoV-2 virus, was declared a pandemic by the World Health Organisation (WHO) in March 2020. By mid-August 2020, more than 21 million people have tested positive worldwide. Infections have been growing rapidly and tremendous efforts are being made to fight the disease. In this paper, we attempt to systematise the various COVID-19 research activities leveraging data science, where we define data science broadly to encompass the various methods and tools-including those from artificial intelligence (AI), machine learning (ML), statistics, modeling, simulation, and data visualization-that can be used to store, process, and extract insights from data. In addition to reviewing the rapidly growing body of recent research, we survey public datasets and repositories that can be used for further work to track COVID-19 spread and mitigation strategies. As part of this, we present a bibliometric analysis of the papers produced in this short span of time. Finally, building on these insights, we highlight common challenges and pitfalls observed across the surveyed works. We also created a live resource repository at https://github.com/Data-Science-and-COVID-19/Leveraging-Data-Science-To-Combat-COVID-19-A-Comprehensive-Review that we intend to keep updated with the latest resources including new papers and datasets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    要检查语音和面部特征与抑郁之间的关联,焦虑,轻度认知障碍(MCI)的老年人和冷漠。
    通过音频和视频记录软件对319名MCI患者的语音和面部表情进行了数字记录。通过公共卫生问卷评估了三种最常见的神经精神症状(NPS),一般焦虑症,和冷漠评估量表,分别。使用开源数据分析工具包提取语音和面部特征。使用机器学习技术来验证所提取特征的诊断能力。
    不同的语音和面部特征与特定的NPS相关。抑郁症与频谱和时间特征有关,焦虑和冷漠的频率,能源,光谱,和时间特征。此外,抑郁症与面部特征(动作单元,AU)10、12、15、17、25,AU焦虑10、15、17、25、26、45,AU冷漠5、26、45。在男性和女性之间观察到语音和面部特征的显着差异。基于机器学习模型,检测抑郁症的最高准确度,焦虑,冷漠达到95.8%,96.1%,男性占83.3%,87.8%,88.2%,女性占88.6%,分别。
    抑郁症,焦虑,冷漠的特点是语言和面部特征不同。在这项研究中开发的机器学习模型在检测抑郁症方面表现出良好的分类,焦虑,和冷漠。音频和视频的组合可以提供用于精确分类这些症状的客观方法。
    To examine the association between speech and facial features with depression, anxiety, and apathy in older adults with mild cognitive impairment (MCI).
    Speech and facial expressions of 319 MCI patients were digitally recorded via audio and video recording software. Three of the most common neuropsychiatric symptoms (NPS) were evaluated by the Public Health Questionnaire, General Anxiety Disorder, and Apathy Evaluation Scale, respectively. Speech and facial features were extracted using the open-source data analysis toolkits. Machine learning techniques were used to validate the diagnostic power of extracted features.
    Different speech and facial features were associated with specific NPS. Depression was associated with spectral and temporal features, anxiety and apathy with frequency, energy, spectral, and temporal features. Additionally, depression was associated with facial features (action unit, AU) 10, 12, 15, 17, 25, anxiety with AU 10, 15, 17, 25, 26, 45, and apathy with AU 5, 26, 45. Significant differences in speech and facial features were observed between males and females. Based on machine learning models, the highest accuracy for detecting depression, anxiety, and apathy reached 95.8%, 96.1%, and 83.3% for males, and 87.8%, 88.2%, and 88.6% for females, respectively.
    Depression, anxiety, and apathy were characterized by distinct speech and facial features. The machine learning model developed in this study demonstrated good classification in detecting depression, anxiety, and apathy. A combination of audio and video may provide objective methods for the precise classification of these symptoms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号