speech intelligibility

语音清晰度
  • 文章类型: Journal Article
    有证据表明,小脑在大脑中的作用并不局限于运动功能。相反,小脑活动似乎对于依赖精确事件定时和预测的各种任务至关重要。由于其复杂的结构和在通信中的重要性,人类的语音需要一个特别精确和预测协调的神经过程被成功地理解。最近的研究表明,小脑确实是语音处理的主要贡献者,但是这种贡献是如何实现的机制仍然知之甚少。本研究旨在揭示皮质-小脑协调的潜在机制,并证明其语音特异性。在对脑磁图数据的重新分析中,我们发现小脑的活动与噪声语音的节奏序列一致,不管它的清晰度。然后我们测试了这些“夹带”响应是否持续存在,以及它们如何与其他大脑区域相互作用,当有节奏的刺激停止并且时间预测必须更新时。我们发现,只有可理解的语音在小脑中产生持续的有节奏的反应。在这个“夹带回声,“但不是在有节奏的演讲中,小脑活动与左额下回有关,特别是以对应于先前刺激节奏的速率。这一发现代表了语音处理中特定小脑驱动的时间预测及其传递到皮质区域的证据。
    Evidence accumulates that the cerebellum\'s role in the brain is not restricted to motor functions. Rather, cerebellar activity seems to be crucial for a variety of tasks that rely on precise event timing and prediction. Due to its complex structure and importance in communication, human speech requires a particularly precise and predictive coordination of neural processes to be successfully comprehended. Recent studies proposed that the cerebellum is indeed a major contributor to speech processing, but how this contribution is achieved mechanistically remains poorly understood. The current study aimed to reveal a mechanism underlying cortico-cerebellar coordination and demonstrate its speech-specificity. In a reanalysis of magnetoencephalography data, we found that activity in the cerebellum aligned to rhythmic sequences of noise-vocoded speech, irrespective of its intelligibility. We then tested whether these \"entrained\" responses persist, and how they interact with other brain regions, when a rhythmic stimulus stopped and temporal predictions had to be updated. We found that only intelligible speech produced sustained rhythmic responses in the cerebellum. During this \"entrainment echo,\" but not during rhythmic speech itself, cerebellar activity was coupled with that in the left inferior frontal gyrus, and specifically at rates corresponding to the preceding stimulus rhythm. This finding represents evidence for specific cerebellum-driven temporal predictions in speech processing and their relay to cortical regions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:自动语音识别(ASR)可以潜在地帮助老年人和残疾人减少对他人的依赖并增加他们对社会的参与。然而,使用此类技术时,上颌切除术患者的语音清晰度降低可能会遇到一些问题。
    目的:研究三种常用的ASR平台在日本上颌骨切除术患者使用和不放置闭孔的情况下的准确性。
    方法:从29例有或没有闭孔的上颌骨切除术患者和17名健康志愿者中获取语音样本。将样本输入到三个与说话者无关的语音识别平台,并将转录的文本与原始文本进行比较以计算音节错误率(SER)。所有参与者还完成了常规的语音清晰度测试,以使用Taguchi的方法对语音进行评分。还对没有闭孔的患者进行了全面的关节评估。
    结果:在健康组和上颌切除组之间观察到SER的显著差异。患有闭孔的上颌切除术患者的言语清晰度评分与SER之间呈显着负相关。然而,对于那些没有闭塞器的人来说,没有观察到显著的相关性。此外,对于没有闭孔的上颌骨切除术患者,按元音分组的音节之间存在显着差异。包含/i/的音节,与包含/a/和/o/的错误相比,/u/和/e/显示出更高的错误率。此外,当按辅音发音和发音方式对音节进行分组时,观察到显着差异。
    结论:这三个平台对健康志愿者和带闭孔的上颌骨切除术患者表现良好,但是没有闭孔的上颌骨切除术患者的SER很高,使平台无法使用。需要系统改进以提高上颌骨切除术患者的准确性。
    BACKGROUND: Automatic speech recognition (ASR) can potentially help older adults and people with disabilities reduce their dependence on others and increase their participation in society. However, maxillectomy patients with reduced speech intelligibility may encounter some problems using such technologies.
    OBJECTIVE: To investigate the accuracy of three commonly used ASR platforms when used by Japanese maxillectomy patients with and without their obturator placed.
    METHODS: Speech samples were obtained from 29 maxillectomy patients with and without their obturator and 17 healthy volunteers. The samples were input into three speaker-independent speech recognition platforms and the transcribed text was compared with the original text to calculate the syllable error rate (SER). All participants also completed a conventional speech intelligibility test to grade their speech using Taguchi\'s method. A comprehensive articulation assessment of patients without their obturator was also performed.
    RESULTS: Significant differences in SER were observed between healthy and maxillectomy groups. Maxillectomy patients with an obturator showed a significant negative correlation between speech intelligibility scores and SER. However, for those without an obturator, no significant correlations were observed. Furthermore, for maxillectomy patients without an obturator, significant differences were found between syllables grouped by vowels. Syllables containing /i/, /u/ and /e/ exhibited higher error rates compared to those containing /a/ and /o/. Additionally, significant differences were observed when syllables were grouped by consonant place of articulation and manner of articulation.
    CONCLUSIONS: The three platforms performed well for healthy volunteers and maxillectomy patients with their obturator, but the SER for maxillectomy patients without their obturator was high, rendering the platforms unusable. System improvement is needed to increase accuracy for maxillectomy patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在嘈杂的环境中理解语音是一项具有挑战性的任务,特别是在与几个相互竞争的演讲者交流的情况下。尽管他们不断改进,辅助听力设备和语音处理方法在嘈杂的多说话者环境中仍然表现不佳,因为它们可能无法恢复竞争声源中感兴趣的说话者的可理解性。在这项研究中,开发了一种准因果深度学习算法,可以提取目标说话人的声音,如简短的入学话语所示,来自背景噪声中多个并发扬声器的混合。使用计算度量的客观评估表明,说话者知情算法成功地从嘈杂的多说话者混合物中提取了目标说话者。这是使用一种通用于看不见的说话者的单一算法实现的,不同数量的扬声器和相对扬声器水平,和不同的语料库。对一种混合的双盲句子识别测试,两个,餐厅噪音中的三名演讲者是与听力正常的听众和听力损失的听众进行的。结果表明,对于没有听力损失和有听力损失的人,说话人告知算法的清晰度显着提高了17%和31%,分别。总之,研究表明,基于深度学习的说话者提取可以在不知情的语音增强方法失败的嘈杂的多说话者环境中增强语音清晰度。
    Understanding speech in noisy environments is a challenging task, especially in communication situations with several competing speakers. Despite their ongoing improvement, assistive listening devices and speech processing approaches still do not perform well enough in noisy multi-talker environments, as they may fail to restore the intelligibility of a speaker of interest among competing sound sources. In this study, a quasi-causal deep learning algorithm was developed that can extract the voice of a target speaker, as indicated by a short enrollment utterance, from a mixture of multiple concurrent speakers in background noise. Objective evaluation with computational metrics demonstrated that the speaker-informed algorithm successfully extracts the target speaker from noisy multi-talker mixtures. This was achieved using a single algorithm that generalized to unseen speakers, different numbers of speakers and relative speaker levels, and different speech corpora. Double-blind sentence recognition tests on mixtures of one, two, and three speakers in restaurant noise were conducted with listeners with normal hearing and listeners with hearing loss. Results indicated significant intelligibility improvements with the speaker-informed algorithm of 17% and 31% for people without and with hearing loss, respectively. In conclusion, it was demonstrated that deep learning-based speaker extraction can enhance speech intelligibility in noisy multi-talker environments where uninformed speech enhancement methods fail.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在倾听的同时,我们通常同时参加活动。例如,在招待会上,人们经常站着谈话。已知的是,收听和姿势控制彼此相关联。先前的研究集中在当语音识别任务具有相当高的认知控制要求时,听力和姿势控制的相互作用。这项研究旨在确定当语音识别任务需要最少的认知控制时,听力和姿势控制是否相互作用。即,当单词没有背景噪音时,或大内存负载。这项研究包括22名年轻人,27名中年人,21名老年人。参与者执行语音识别任务(听觉单一任务),姿势控制任务(姿势单一任务)和组合姿势控制和语音识别任务(双重任务),以评估多任务处理的效果。通过更改单词的级别(25或30dBSPL)和平台的移动性(稳定或移动)来操纵听力和姿势控制任务的难度级别。患有听力障碍的成年人的声级增加。在双重任务中,听力表现下降,尤其是中老年人,而姿势控制有所改善。这些结果表明,即使对听力的认知控制需求很小,与姿势控制的相互作用发生。相关分析显示,听力损失比言语识别和姿势控制的年龄更好。
    While listening, we commonly participate in simultaneous activities. For instance, at receptions people often stand while engaging in conversation. It is known that listening and postural control are associated with each other. Previous studies focused on the interplay of listening and postural control when the speech identification task had rather high cognitive control demands. This study aimed to determine whether listening and postural control interact when the speech identification task requires minimal cognitive control, i.e., when words are presented without background noise, or a large memory load. This study included 22 young adults, 27 middle-aged adults, and 21 older adults. Participants performed a speech identification task (auditory single task), a postural control task (posture single task) and combined postural control and speech identification tasks (dual task) to assess the effects of multitasking. The difficulty levels of the listening and postural control tasks were manipulated by altering the level of the words (25 or 30 dB SPL) and the mobility of the platform (stable or moving). The sound level was increased for adults with a hearing impairment. In the dual-task, listening performance decreased, especially for middle-aged and older adults, while postural control improved. These results suggest that even when cognitive control demands for listening are minimal, interaction with postural control occurs. Correlational analysis revealed that hearing loss was a better predictor than age of speech identification and postural control.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    自创建以来,协调响应措施(CRM)语料库已在数百项研究中应用,以探索多说话者情况下的信息掩蔽机制,而且在噪音或听觉注意任务中也是如此。这里,我们提供它的法语版本,内容与英文原始版本相同。此外,对法语中的语音可懂度的评估显示了与英语中的原始数据相似的结果模式的信息掩蔽。对法语CRM语料库的验证允许建议使用CRM进行法语的清晰度测试,并在掩蔽条件下与外语进行比较。
    Since its creation, the coordinate response measure (CRM) corpus has been applied in hundreds of studies to explore the mechanisms of informational masking in multi-talker situations, but also in speech-in-noise or auditory attentional tasks. Here, we present its French version, with equivalent content to the original version in English. Furthermore, an evaluation of speech-on-speech intelligibility in French shows informational masking with similar result patterns to the original data in English. This validation of the French CRM corpus allows to propose the use of the CRM for intelligibility tests in French, and for comparisons with a foreign language under masking conditions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    语音识别测试广泛用于临床和研究听力学。这项研究的目的是开发一种新颖的语音识别测试,该测试结合了不同语音识别测试的概念,以减少训练效果,并允许大量的语音材料。新测试由每个试验中的四个不同的单词组成,具有固定结构的有意义的结构,所谓的短语。使用各种免费数据库来选择单词并确定其频率。频繁使用的名词被分为主题类别,并与相关的形容词和不定式相结合。丢弃不适当和不自然的组合后,并消除(子)短语的重复,总共有772个短语。随后,这些短语是使用文本到语音系统合成的。与使用真实扬声器的录音相比,合成显着减少了工作量。排除异常值后,在固定的信噪比(SNR)下,对31名正常听力参与者的短语测得的语音识别得分显示,每个短语的语音识别阈值(SRT)变化高达4dB。中值SRT为-9.1dBSNR,因此与现有的句子测试相当。心理测量功能的斜率为每dB15个百分点,也具有可比性,可以有效地用于听力学。总结,在模块化系统中创建语音材料的原理具有许多潜在的应用。
    Speech-recognition tests are widely used in both clinical and research audiology. The purpose of this study was the development of a novel speech-recognition test that combines concepts of different speech-recognition tests to reduce training effects and allows for a large set of speech material. The new test consists of four different words per trial in a meaningful construct with a fixed structure, the so-called phrases. Various free databases were used to select the words and to determine their frequency. Highly frequent nouns were grouped into thematic categories and combined with related adjectives and infinitives. After discarding inappropriate and unnatural combinations, and eliminating duplications of (sub-)phrases, a total number of 772 phrases remained. Subsequently, the phrases were synthesized using a text-to-speech system. The synthesis significantly reduces the effort compared to recordings with a real speaker. After excluding outliers, measured speech-recognition scores for the phrases with 31 normal-hearing participants at fixed signal-to-noise ratios (SNR) revealed speech-recognition thresholds (SRT) for each phrase varying up to 4 dB. The median SRT was -9.1 dB SNR and thus comparable to existing sentence tests. The psychometric function\'s slope of 15 percentage points per dB is also comparable and enables efficient use in audiology. Summarizing, the principle of creating speech material in a modular system has many potential applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    具有正常测听阈值的听众在理解噪声中的语音(SiN)的能力方面表现出很大的变异性。据报道,这些个体差异与一系列听觉和认知能力有关。本研究解决了SiN处理与短期记忆对听觉干扰的个体易感性之间的关联(即,无关的声音效果[ISE])。在67名听力阈值正常的年轻成年参与者的样本中,我们测量了两个干扰说话者在空间聆听任务中的语音识别性能(语音识别),听力阈值,双耳对时间精细结构的敏感性(耳间相位差[IPD]),具有和不具有干扰的通话器的串行存储器,和自我报告的噪声灵敏度。语音中的语音处理与ISE没有显着相关。高语音语音识别性能的最重要预测因素是短期记忆跨度大,低IPD阈值,双侧对称测听阈值,和低个体噪声灵敏度。令人惊讶的是,与未中断的短期记忆容量相比,短期记忆对无关声音的敏感性导致语音中语音处理的方差小得多。数据证实了双耳敏感性对时间精细结构的作用,尽管它与SiN识别的关联比以前的一些研究要弱。自报告噪声灵敏度与SiN处理之间的逆关联值得进一步研究。
    Listeners with normal audiometric thresholds show substantial variability in their ability to understand speech in noise (SiN). These individual differences have been reported to be associated with a range of auditory and cognitive abilities. The present study addresses the association between SiN processing and the individual susceptibility of short-term memory to auditory distraction (i.e., the irrelevant sound effect [ISE]). In a sample of 67 young adult participants with normal audiometric thresholds, we measured speech recognition performance in a spatial listening task with two interfering talkers (speech-in-speech identification), audiometric thresholds, binaural sensitivity to the temporal fine structure (interaural phase differences [IPD]), serial memory with and without interfering talkers, and self-reported noise sensitivity. Speech-in-speech processing was not significantly associated with the ISE. The most important predictors of high speech-in-speech recognition performance were a large short-term memory span, low IPD thresholds, bilaterally symmetrical audiometric thresholds, and low individual noise sensitivity. Surprisingly, the susceptibility of short-term memory to irrelevant sound accounted for a substantially smaller amount of variance in speech-in-speech processing than the nondisrupted short-term memory capacity. The data confirm the role of binaural sensitivity to the temporal fine structure, although its association to SiN recognition was weaker than in some previous studies. The inverse association between self-reported noise sensitivity and SiN processing deserves further investigation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    当调查不可观察时,复杂的特征,数据收集和聚合过程可以为数据引入独特的特征,如有界性,测量误差,聚类,异常值,和异方差。未能共同解决这些特征可能会导致统计挑战,从而阻止对有关这些特征的假设的调查。本研究旨在证明贝叶斯β比例广义线性潜在和混合模型(β比例GLLAMM)的有效性(Rabe-Hesketh等人。,Psychometrika,69(2)、167-90,2004a,计量经济学杂志,128(2)、301-23,2004c,2004b;Skrondal和Rabe-Hesketh2004)在探索有关语音清晰度的研究假设时处理数据特征。为了实现这一目标,该研究重新检查了Boonen等人最初收集的自发语音样本的转录数据。(儿童语言杂志,50(1)、78-103,2023年)。将数据汇总为熵得分。研究比较了β-比例GLLAMM与正态线性混合模型(LMM)的预测精度(Holmes等.,2019年),并研究了其从熵分数估计潜在可懂度的能力。该研究还说明了如何使用所提出的模型来探索有关说话者相关因素对可懂度的影响的假设。beta比例GLLAMM并非没有挑战;其实施需要制定有关数据生成过程的假设以及概率编程语言的知识,都是贝叶斯方法的核心。然而,结果表明,该模型在预测经验现象方面优于正常LMM,以及它量化潜在可理解性的能力。此外,所提出的模型有助于探索有关说话者相关因素和可理解性的假设。最终,这项研究对对定量测量复杂的研究人员和数据分析师有意义,在准确预测经验现象的同时,无法观察到的结构。
    When investigating unobservable, complex traits, data collection and aggregation processes can introduce distinctive features to the data such as boundedness, measurement error, clustering, outliers, and heteroscedasticity. Failure to collectively address these features can result in statistical challenges that prevent the investigation of hypotheses regarding these traits. This study aimed to demonstrate the efficacy of the Bayesian beta-proportion generalized linear latent and mixed model (beta-proportion GLLAMM) (Rabe-Hesketh et al., Psychometrika, 69(2), 167-90, 2004a, Journal of Econometrics, 128(2), 301-23, 2004c, 2004b; Skrondal and Rabe-Hesketh 2004) in handling data features when exploring research hypotheses concerning speech intelligibility. To achieve this objective, the study reexamined data from transcriptions of spontaneous speech samples initially collected by Boonen et al. (Journal of Child Language, 50(1), 78-103, 2023). The data were aggregated into entropy scores. The research compared the prediction accuracy of the beta-proportion GLLAMM with the normal linear mixed model (LMM) (Holmes et al., 2019) and investigated its capacity to estimate a latent intelligibility from entropy scores. The study also illustrated how hypotheses concerning the impact of speaker-related factors on intelligibility can be explored with the proposed model. The beta-proportion GLLAMM was not free of challenges; its implementation required formulating assumptions about the data-generating process and knowledge of probabilistic programming languages, both central to Bayesian methods. Nevertheless, results indicated the superiority of the model in predicting empirical phenomena over the normal LMM, and its ability to quantify a latent potential intelligibility. Additionally, the proposed model facilitated the exploration of hypotheses concerning speaker-related factors and intelligibility. Ultimately, this research has implications for researchers and data analysts interested in quantitatively measuring intricate, unobservable constructs while accurately predicting the empirical phenomena.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    尽管电话频段(0.3-3kHz)为语音识别提供了足够的信息,非电话频段(<0.3和>3kHz)的贡献尚不清楚。为了调查它的贡献,使用辅音评估语音清晰度和说话者识别,元音,和句子。非电话频段对辅音(76.0%)和句子(77.4%)产生了相对较好的清晰度,但不是元音(11.5%)。非电话频段仅支持句子的良好说话者识别(74.5%),但不是元音(45.8%)或辅音(10.8%)。此外,非电话频段在句子级别的噪声中不能产生令人满意的语音清晰度,暗示了全波段接入在现实听力中的重要性。
    Although the telephone band (0.3-3 kHz) provides sufficient information for speech recognition, the contribution of the non-telephone band (<0.3 and >3 kHz) is unclear. To investigate its contribution, speech intelligibility and talker identification were evaluated using consonants, vowels, and sentences. The non-telephone band produced relatively good intelligibility for consonants (76.0%) and sentences (77.4%), but not vowels (11.5%). The non-telephone band supported good talker identification only with sentences (74.5%), but not vowels (45.8%) or consonants (10.8%). Furthermore, the non-telephone band cannot produce satisfactory speech intelligibility in noise at the sentence level, suggesting the importance of full-band access in realistic listening.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在与不熟悉的沟通伙伴的对话中,检查了聋哑和听力障碍(DHH)和听力青少年的沟通障碍及其修复。
    这项研究比较了16名DHH和16名11-16岁听力正常青少年的澄清请求和答复的数量和类型,和一个不熟悉的成年人谈了10分钟.它还分析了语音清晰度之间的关系,通信故障,以及一个不熟悉的成年人的澄清请求。儿童沟通清单(CCC)由父母填写。
    DHH青少年在与不熟悉的成年人的对话中,与非听力正常的青少年相比,对非言语澄清请求以及对澄清请求的言语和非言语反应的使用率明显更高。此外,DHH青少年的CCC子量表得分和言语清晰度明显低于听力正常的青少年。语音清晰度与CCC的语音子量表得分之间存在相关性,以及CCC的语用综合得分之间的相关性,通信中断的数量,以及不熟悉的成年人要求澄清的数量。
    患有DHH的青少年在与不熟悉的成年人的交谈中经历了更多的沟通障碍,成年人提出的澄清请求数量更高。
    UNASSIGNED: Communication breakdowns and their repair by deaf and hard-of-hearing (DHH) and hearing adolescents were examined in conversation with an unfamiliar communication partner.
    UNASSIGNED: This study compared the number and type of clarification requests and responses to those requests of 16 DHH and 16 normal-hearing adolescents aged 11-16 years, in a 10-minute conversation with an unfamiliar adult. It also analyzed the relationship between speech intelligibility, communication breakdowns, and clarification requests by an unfamiliar adult. the Children\'s Communication Checklist (CCC) was completed by parents.
    UNASSIGNED: DHH adolescents demonstrated significantly higher usage of nonverbal clarification requests and verbal and nonverbal responses to clarification requests compared to normal-hearing adolescents in conversations with an unfamiliar adult. Furthermore, the subscale scores of the CCC and the speech intelligibility of DHH adolescents were significantly lower than those of normal-hearing adolescents. There were correlations between speech intelligibility and the speech subscale score of the CCC, as well as correlations between the pragmatic composite score of the CCC, the number of communication breakdowns, and the number of clarification requests by an unfamiliar adult.
    UNASSIGNED: The adolescents with DHH experienced more communication breakdowns in conversation with an unfamiliar adult and the number of clarification requests made by adults was higher.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号