speech

演讲
  • 文章类型: Journal Article
    背景:发展二语口语能力对学习者来说是具有挑战性的,特别是在促进自我调节和保持参与方面。智能个人助理(IPA)通过提供可访问、交互式语言学习的机会。
    方法:这项混合方法研究调查了在以学习为导向的反馈(LOA)框架内使用GoogleAssistant来提高L2口语能力的有效性。自我调节,中国54名大学水平的EFL学习者中的学习者参与度。便利抽样将参与者分配到使用GoogleAssistant进行定制活动的实验组(n=27)或使用传统方法的对照组(n=27)。口语水平面试(OPI)评估口语表现。自我报告的问卷测量了L2动机,并评估了讲英语作为外语的战略自我调节量表(S2RS-EFL)。此外,对实验组子样本的半结构化访谈提供了定性见解。
    结果:与对照组相比,GoogleAssistant组的口语表现有统计学上的显着改善。虽然没有发现动机的显著差异,访谈的主题分析揭示了谷歌助手的感知好处,包括增加可访问性,交互性,和即时的发音反馈。这些功能可能有助于更吸引人的学习体验,有可能促进符合LOA核心原则的自我调节发展。
    结论:这项研究表明,GoogleAssistant是提高二语口语能力的一种有希望的补充工具。学习者自主,以及LOA框架内的潜在自我调节。需要进一步的研究来探索其对动机的影响并优化参与策略。
    BACKGROUND: Developing L2 speaking proficiency can be challenging for learners, particularly when it comes to fostering self-regulation and maintaining engagement. Intelligent Personal Assistants (IPAs) offer a potential solution by providing accessible, interactive language learning opportunities.
    METHODS: This mixed-methods study investigated the effectiveness of using Google Assistant within a learning-oriented feedback (LOA) framework to enhance L2 speaking proficiency, self-regulation, and learner engagement among 54 university-level EFL learners in China. Convenience sampling assigned participants to either an experimental group (n = 27) using Google Assistant with tailored activities or a control group (n = 27) using traditional methods. The Oral Proficiency Interview (OPI) assessed speaking performance. Self-reported questionnaires measured L2 motivation and the Scale of Strategic Self-Regulation for Speaking English as a Foreign Language (S2RS-EFL) evaluated speaking self-regulation. Additionally, semi-structured interviews with a subsample of the experimental group provided qualitative insights.
    RESULTS: The Google Assistant group demonstrated a statistically significant improvement in speaking performance compared to the control group. While no significant difference in motivation was found, thematic analysis of interviews revealed perceived benefits of Google Assistant, including increased accessibility, interactivity, and immediate pronunciation feedback. These features likely contributed to a more engaging learning experience, potentially fostering self-regulation development in line with the core principles of LOA.
    CONCLUSIONS: This study suggests Google Assistant as a promising supplementary tool for enhancing L2 speaking proficiency, learner autonomy, and potentially self-regulation within an LOA framework. Further research is needed to explore its impact on motivation and optimize engagement strategies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    语音情感识别(SER)技术涉及特征提取和预测模型。然而,由于性别差异和大量提取的特征,识别效率有下降的趋势。因此,本文介绍了一种基于性别的SER系统。首先,从语音信号中提取性别和情感特征,建立性别识别和情感分类模型。第二,根据性别差异,为男性和女性说话者建立了不同的情感识别模型。在执行相应的情感模型之前确定说话者的性别。第三,通过利用先进的差分进化算法(ADE)选择最佳特征来提高这些情绪模型的准确性。ADE结合了新的差分向量,变异算子,和位置学习,这有效地平衡了全球和本地搜索。提出了一种新的位置修复方法来解决性别差异。最后,在四个英文数据集上的实验表明,ADE在识别准确性方面优于比较算法,召回,精度,F1分数,使用的功能数量和执行时间。研究结果强调了性别在完善情绪模型中的重要性,而梅尔频率倒谱系数是性别差异的重要因素。
    Speech emotion recognition (SER) technology involves feature extraction and prediction models. However, recognition efficiency tends to decrease because of gender differences and the large number of extracted features. Consequently, this paper introduces a SER system based on gender. First, gender and emotion features are extracted from speech signals to develop gender recognition and emotion classification models. Second, according to gender differences, distinct emotion recognition models are established for male and female speakers. The gender of speakers is determined before executing the corresponding emotion model. Third, the accuracy of these emotion models is enhanced by utilizing an advanced differential evolution algorithm (ADE) to select optimal features. ADE incorporates new difference vectors, mutation operators, and position learning, which effectively balance global and local searches. A new position repairing method is proposed to address gender differences. Finally, experiments on four English datasets demonstrate that ADE is superior to comparison algorithms in recognition accuracy, recall, precision, F1-score, the number of used features and execution time. The findings highlight the significance of gender in refining emotion models, while mel-frequency cepstral coefficients are important factors in gender differences.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究旨在评估人工耳蜗植入作为Waardenburg综合征(WS)患者治疗的潜在疗效,并通过比较人工耳蜗植入后对听力和言语恢复的影响来指导临床工作。WS和非WS。
    PubMed,Cochrane图书馆,CNKI,和万方数据是检索WS人工耳蜗植入文献的来源,和符合纳入标准的临床数据使用RevMan5.41进行荟萃分析。
    本研究共纳入9篇文章,其中WS患者132例,对照组815例。荟萃分析表明,审计绩效类别(CAP)的得分没有显著差异,语音清晰度等级(SIR),和父母对WS组和对照组之间儿童的听觉/口腔表现(PEACH)的评估。
    人工耳蜗植入对WS患者和非WS患者的听觉和言语恢复结果具有可比性。
    UNASSIGNED: This study aims to assess the potential efficacy of cochlear implantation as a treatment for patients with Waardenburg syndrome (WS) and to guide clinical work by comparing the effect of auditory and speech recovery after cochlear implantation in patients with WS and non-WS.
    UNASSIGNED: PubMed, the Cochrane Library, CNKI, and Wanfang Data were sources for retrieving literature on cochlear implantation in WS, and clinical data meeting the inclusion criteria were meta-analyzed using RevMan5.41.
    UNASSIGNED: A total of nine articles were included in this study, including 132 patients with WS and 815 patients in the control group. Meta-analysis showed that there are no significant differences in the scores for categories of audit performance (CAP), speech intelligibility rating (SIR), and parents\' evaluation of aural/oral performance of children (PEACH) between the WS group and the control group.
    UNASSIGNED: Cochlear implantation demonstrates comparable auditory and speech recovery outcomes for WS patients and non-WS patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    自闭症患者的言语韵律异常已被广泛报道。许多关于自闭症谱系障碍儿童和成年人说非音调语言的研究表明,使用韵律线索来标记焦点的缺陷。然而,很少检查自闭症儿童说一种音调语言的重点标记。说广东话的孩子可能会面临额外的困难,因为音调语言要求他们使用韵律提示来同时实现多种功能,例如词汇对比和焦点标记。这项研究通过在声学上评估使用粤语语音韵律来标记患有和不患有自闭症谱系障碍的粤语儿童的信息结构,从而弥合了这一研究差距。我们设计了语音制作任务,以在具有不同音调组合的句子中在这些孩子中引起自然的广泛和狭窄的焦点制作。分析了韵律焦点标记的声学相关性,如f0,每个音节的持续时间和强度,以检查参与者组的效果,焦点条件和词汇音调。我们的结果表明,有和没有自闭症谱系障碍的说广东话的儿童之间的焦点标记模式存在差异。自闭症儿童在标记焦点时,不仅在f0范围和持续时间方面表现出焦点扩展不足,但通常也产生不太独特的色调形状。没有证据表明韵律复杂性(即单音或组合的句子)显着影响这些自闭症儿童及其典型发育(TD)同伴的焦点标记。
    Abnormal speech prosody has been widely reported in individuals with autism. Many studies on children and adults with autism spectrum disorder speaking a non-tonal language showed deficits in using prosodic cues to mark focus. However, focus marking by autistic children speaking a tonal language is rarely examined. Cantonese-speaking children may face additional difficulties because tonal languages require them to use prosodic cues to achieve multiple functions simultaneously such as lexical contrasting and focus marking. This study bridges this research gap by acoustically evaluating the use of Cantonese speech prosody to mark information structure by Cantonese-speaking children with and without autism spectrum disorder. We designed speech production tasks to elicit natural broad and narrow focus production among these children in sentences with different tone combinations. Acoustic correlates of prosodic focus marking like f0, duration and intensity of each syllable were analyzed to examine the effect of participant group, focus condition and lexical tones. Our results showed differences in focus marking patterns between Cantonese-speaking children with and without autism spectrum disorder. The autistic children not only showed insufficient on-focus expansion in terms of f0 range and duration when marking focus, but also produced less distinctive tone shapes in general. There was no evidence that the prosodic complexity (i.e. sentences with single tones or combinations of tones) significantly affected focus marking in these autistic children and their typically-developing (TD) peers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:本研究旨在对抑郁症中使用语音样本进行深度学习(DL)的诊断准确性进行系统综述和荟萃分析。
    方法:本综述包括报告使用语音数据对抑郁症的DL算法诊断结果的研究,从成立到2024年1月31日,在PubMed上发表,Medline,Embase,PsycINFO,Scopus,IEEE,和WebofScience数据库。汇集精度,灵敏度,和特异性通过随机效应模型获得。诊断精度研究质量评估工具(QUADAS-2)用于评估偏倚风险。
    结果:共有25项研究符合纳入标准,其中8项用于荟萃分析。对准确性的汇总估计,特异性,抑郁检测模型的敏感性为0.87(95%CI,0.81-0.93),0.85(95%CI,0.78-0.91),和0.82(95%CI,0.71-0.94),分别。按模型结构分层时,手工制作组的合并诊断准确率最高为0.89(95%CI,0.81~0.97).
    结论:据我们所知,我们的研究是关于从语音样本中检测抑郁症的DL诊断性能的首次荟萃分析.荟萃分析中包含的所有研究都使用卷积神经网络(CNN)模型,在解密其他DL算法的性能方面存在问题。手工制作的模型在语音抑郁检测中的性能优于端到端模型。
    结论:DL在语音中的应用为抑郁症检测提供了有用的工具。具有手工制作的声学特征的CNN模型可以帮助提高诊断性能。
    背景:研究方案已在PROSPERO(CRD42023423603)上注册。
    OBJECTIVE: This study aims to conduct a systematic review and meta-analysis of the diagnostic accuracy of deep learning (DL) using speech samples in depression.
    METHODS: This review included studies reporting diagnostic results of DL algorithms in depression using speech data, published from inception to January 31, 2024, on PubMed, Medline, Embase, PsycINFO, Scopus, IEEE, and Web of Science databases. Pooled accuracy, sensitivity, and specificity were obtained by random-effect models. The diagnostic Precision Study Quality Assessment Tool (QUADAS-2) was used to assess the risk of bias.
    RESULTS: A total of 25 studies met the inclusion criteria and 8 of them were used in the meta-analysis. The pooled estimates of accuracy, specificity, and sensitivity for depression detection models were 0.87 (95% CI, 0.81-0.93), 0.85 (95% CI, 0.78-0.91), and 0.82 (95% CI, 0.71-0.94), respectively. When stratified by model structure, the highest pooled diagnostic accuracy was 0.89 (95% CI, 0.81-0.97) in the handcrafted group.
    CONCLUSIONS: To our knowledge, our study is the first meta-analysis on the diagnostic performance of DL for depression detection from speech samples. All studies included in the meta-analysis used convolutional neural network (CNN) models, posing problems in deciphering the performance of other DL algorithms. The handcrafted model performed better than the end-to-end model in speech depression detection.
    CONCLUSIONS: The application of DL in speech provided a useful tool for depression detection. CNN models with handcrafted acoustic features could help to improve the diagnostic performance.
    BACKGROUND: The study protocol was registered on PROSPERO (CRD42023423603).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    最近的研究广泛报道了语音交流过程中说话者和听者之间的脑神经耦合现象。然而,这种神经耦合背后的特定语音过程仍然难以捉摸。为了弥合这个差距,这项研究估计了说话者-听者神经耦合的时间动力学与语音特征之间的相关性,利用两个脑间数据集,考虑不同的噪声水平和听者的语言体验(原生与非本地)。我们首先推导了时变的说话者-听众神经耦合,从语音中提取声学特征(包络)和语义特征(熵和令人惊讶),然后探讨了它们的相关关系。我们的研究结果表明,在明确的条件下,说话者-听者神经耦合与语义特征相关。然而,随着噪音的增加,这种相关性只对本地听众有意义。对于非本机侦听器,神经耦合主要与声学特征相关,而不是语义特征。这些结果揭示了在各种情况下,说话者-听者神经耦合如何与声学和语义特征相关联,丰富了我们对自然言语交流过程中脑间神经机制的理解。因此,我们主张更多地关注说话者-听者神经耦合的动态性质及其具有多级语音特征的建模。
    Recent research has extensively reported the phenomenon of inter-brain neural coupling between speakers and listeners during speech communication. Yet, the specific speech processes underlying this neural coupling remain elusive. To bridge this gap, this study estimated the correlation between the temporal dynamics of speaker-listener neural coupling with speech features, utilizing two inter-brain datasets accounting for different noise levels and listener\'s language experiences (native vs. non-native). We first derived time-varying speaker-listener neural coupling, extracted acoustic feature (envelope) and semantic features (entropy and surprisal) from speech, and then explored their correlational relationship. Our findings reveal that in clear conditions, speaker-listener neural coupling correlates with semantic features. However, as noise increases, this correlation is only significant for native listeners. For non-native listeners, neural coupling correlates predominantly with acoustic feature rather than semantic features. These results revealed how speaker-listener neural coupling is associated with the acoustic and semantic features under various scenarios, enriching our understanding of the inter-brain neural mechanisms during natural speech communication. We therefore advocate for more attention on the dynamic nature of speaker-listener neural coupling and its modeling with multilevel speech features.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目标:孤独会影响许多老年人的健康,然而,缺乏有效和有针对性的干预措施。与调查相比,语音数据可以捕捉个性化的孤独体验。在这个概念验证研究中,我们使用自然语言处理来提取新的语言特征,并使用AI方法来识别将孤独成年人与非孤独成年人区分开的语言特征。
    方法:参与者完成了UCLA孤独感量表和半结构化访谈(部分:社会关系,孤独,成功的衰老,生活中的意义/目的,智慧,技术和成功的衰老)。我们使用语言查询和单词计数(LIWC-22)程序来分析语言特征,并构建了一个分类器来预测孤独感。使用可解释的AI(XAI)模型对每个访谈部分进行了分析,以对孤独感进行分类。
    结果:样本包括97名老年人(年龄66-101岁,65%女性)。该模型具有较高的准确性(精度:0.889,AUC:0.8),精度(F1:0.8),和召回(1.0)。关于社会关系和孤独的部分对于对孤独进行分类最重要。社会主题,对话填充物,代词用法是孤独分类的重要特征。
    结论:XAI方法可用于通过分析非结构化语音来检测孤独感,并更好地理解孤独感。
    OBJECTIVE: Loneliness impacts the health of many older adults, yet effective and targeted interventions are lacking. Compared to surveys, speech data can capture the personalized experience of loneliness. In this proof-of-concept study, we used Natural Language Processing to extract novel linguistic features and AI approaches to identify linguistic features that distinguish lonely adults from non-lonely adults.
    METHODS: Participants completed UCLA loneliness scales and semi-structured interviews (sections: social relationships, loneliness, successful aging, meaning/purpose in life, wisdom, technology and successful aging). We used the Linguistic Inquiry and Word Count (LIWC-22) program to analyze linguistic features and built a classifier to predict loneliness. Each interview section was analyzed using an explainable AI (XAI) model to classify loneliness.
    RESULTS: The sample included 97 older adults (age 66-101 years, 65 % women). The model had high accuracy (Accuracy: 0.889, AUC: 0.8), precision (F1: 0.8), and recall (1.0). The sections on social relationships and loneliness were most important for classifying loneliness. Social themes, conversational fillers, and pronoun usage were important features for classifying loneliness.
    CONCLUSIONS: XAI approaches can be used to detect loneliness through the analyses of unstructured speech and to better understand the experience of loneliness.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    大量的工作已经调查了音乐和语言处理之间的相似性,但目前尚不清楚是否典型,真正的音乐可以通过跨域启动影响语音处理。为了调查这一点,我们测量了音乐短语和语法模糊的中文短语的ERP,这些短语可以通过早期或晚期韵律边界消除歧义。音乐素数也有早期或晚期韵律边界,我们要求参与者判断素数和目标是否具有相同的结构。在音乐短语中,韵律边界引起N1减少,P2分量增强(相对于无边界条件),具有后期边界的音乐短语表现出闭合正移位(CPS)分量。更重要的是,与非主词短语相比,主词目标短语引起的CPS较小,无论模糊短语的类型如何。这些结果表明,韵律引发可以发生在不同的领域,支持音乐和语言处理中常见神经过程的存在。
    Considerable work has investigated similarities between the processing of music and language, but it remains unclear whether typical, genuine music can influence speech processing via cross-domain priming. To investigate this, we measured ERPs to musical phrases and to syntactically ambiguous Chinese phrases that could be disambiguated by early or late prosodic boundaries. Musical primes also had either early or late prosodic boundaries and we asked participants to judge whether the prime and target have the same structure. Within musical phrases, prosodic boundaries elicited reduced N1 and enhanced P2 components (relative to the no-boundary condition) and musical phrases with late boundaries exhibited a closure positive shift (CPS) component. More importantly, primed target phrases elicited a smaller CPS compared to non-primed phrases, regardless of the type of ambiguous phrase. These results suggest that prosodic priming can occur across domains, supporting the existence of common neural processes in music and language processing.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    现有的端到端语音识别方法通常采用基于CTC和Transformer的混合解码器。然而,这些混合解码器中的误差累积问题阻碍了精度的进一步提高。此外,大多数现有模型都建立在Transformer架构上,这往往是复杂和不友好的小数据集。因此,提出了一种用于语音识别的非线性正则化解码方法。首先,我们介绍了非线性变换器解码器,打破传统的从左到右或从右到左的解码顺序,并实现任何字符之间的关联,减轻小数据集上Transformer体系结构的限制。其次,我们提出了一种新颖的正则化注意力模块来优化注意力得分矩阵,减少早期错误对后期输出的影响。最后,我们引入微小模型来解决模型参数过大的挑战。实验结果表明,我们的模型表现出良好的性能。与基线相比,我们的模型实现了0.12%的识别改进,0.54%,0.51%,和1.2%的Aishell1,Primewords,免费ST中文语料库,和维吾尔语的普通语音16.1数据集,分别。
    Existing end-to-end speech recognition methods typically employ hybrid decoders based on CTC and Transformer. However, the issue of error accumulation in these hybrid decoders hinders further improvements in accuracy. Additionally, most existing models are built upon Transformer architecture, which tends to be complex and unfriendly to small datasets. Hence, we propose a Nonlinear Regularization Decoding Method for Speech Recognition. Firstly, we introduce the nonlinear Transformer decoder, breaking away from traditional left-to-right or right-to-left decoding orders and enabling associations between any characters, mitigating the limitations of Transformer architectures on small datasets. Secondly, we propose a novel regularization attention module to optimize the attention score matrix, reducing the impact of early errors on later outputs. Finally, we introduce the tiny model to address the challenge of overly large model parameters. The experimental results indicate that our model demonstrates good performance. Compared to the baseline, our model achieves recognition improvements of 0.12%, 0.54%, 0.51%, and 1.2% on the Aishell1, Primewords, Free ST Chinese Corpus, and Common Voice 16.1 datasets of Uyghur, respectively.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    唇语识别迫切需要可穿戴且易于使用的接口,以实现无干扰和高保真的唇读采集,并开发伴随的数据高效解码器建模方法。现有的解决方案遭受不可靠的唇读,渴望数据,并表现出较差的概括性。这里,我们提出了一种可穿戴式唇语解码技术,该技术基于可穿戴式动作捕捉和连续的嘴唇语音运动重建,实现了嘴唇运动的无干扰和高保真采集以及流利唇语的数据高效识别。该方法允许我们从非常有限的用户单词样本语料库中人工生成任何想要的连续语音数据集。通过使用这些人工数据集来训练解码器,对于93个英语句子的实际连续和流畅的唇语语音识别,我们在个体(n=7)中的平均准确率为92.0%,甚至没有观察到用户的训练烧伤,因为所有的训练数据集都是人工生成的。我们的方法极大地减少了用户的训练/学习负荷,并为唇语识别提供了一种数据高效且易于使用的范例。
    Lip language recognition urgently needs wearable and easy-to-use interfaces for interference-free and high-fidelity lip-reading acquisition and to develop accompanying data-efficient decoder-modeling methods. Existing solutions suffer from unreliable lip reading, are data hungry, and exhibit poor generalization. Here, we propose a wearable lip language decoding technology that enables interference-free and high-fidelity acquisition of lip movements and data-efficient recognition of fluent lip language based on wearable motion capture and continuous lip speech movement reconstruction. The method allows us to artificially generate any wanted continuous speech datasets from a very limited corpus of word samples from users. By using these artificial datasets to train the decoder, we achieve an average accuracy of 92.0% across individuals (n = 7) for actual continuous and fluent lip speech recognition for 93 English sentences, even observing no training burn on users because all training datasets are artificially generated. Our method greatly minimizes users\' training/learning load and presents a data-efficient and easy-to-use paradigm for lip language recognition.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号