Speech Acoustics

语音声学
  • 文章类型: Journal Article
    背景:数字语音评估最早具有潜在的相关性,阿尔茨海默病(AD)的临床前阶段。我们评估了可行性,测试-重测可靠性,以及与AD相关的β淀粉样蛋白(Aβ)病理学相关的语音声学在远程环境中进行多次评估。
    方法:50名认知未受损的成年人(年龄68±6.2岁,58%女性,46%Aβ阳性)完成远程,基于平板电脑的语音评估(即,图片描述,日记提示讲故事,口头流利的任务)五天。在2-3周后重复测试范例。从录音中自动提取声学语音特征,并计算5天期间的平均得分.我们通过系统可用性量表(SUS)问卷的依从率和可用性评级来评估可行性。采用组内相关系数(ICC)检查重测信度。我们调查了声学特征与Aβ病理学之间的关联,使用线性回归模型,根据年龄调整,性和教育。
    结果:语音评估是可行的,91.6%的依从性和可用性评分为86.0±9.9。在平均语音样本中发现高可靠性(ICC≥0.75)。Aβ阳性个体在图片描述(B=-0.05,p=0.040)和日记提示讲故事(B=-0.07,p=0.032)中显示出比Aβ阴性个体更高的停顿与单词比率,尽管这种影响在多次测试校正后失去了意义。
    结论:我们的研究结果支持对有和没有Aβ病理学的认知未受损个体进行语音声学的多日远程评估的可行性和可靠性,这为在早期AD中使用语音生物标志物奠定了基础。
    BACKGROUND: Digital speech assessment has potential relevance in the earliest, preclinical stages of Alzheimer\'s disease (AD). We evaluated the feasibility, test-retest reliability, and association with AD-related amyloid-beta (Aβ) pathology of speech acoustics measured over multiple assessments in a remote setting.
    METHODS: Fifty cognitively unimpaired adults (Age 68 ± 6.2 years, 58% female, 46% Aβ-positive) completed remote, tablet-based speech assessments (i.e., picture description, journal-prompt storytelling, verbal fluency tasks) for five days. The testing paradigm was repeated after 2-3 weeks. Acoustic speech features were automatically extracted from the voice recordings, and mean scores were calculated over the 5-day period. We assessed feasibility by adherence rates and usability ratings on the System Usability Scale (SUS) questionnaire. Test-retest reliability was examined with intraclass correlation coefficients (ICCs). We investigated the associations between acoustic features and Aβ-pathology, using linear regression models, adjusted for age, sex and education.
    RESULTS: The speech assessment was feasible, indicated by 91.6% adherence and usability scores of 86.0 ± 9.9. High reliability (ICC ≥ 0.75) was found across averaged speech samples. Aβ-positive individuals displayed a higher pause-to-word ratio in picture description (B = -0.05, p = 0.040) and journal-prompt storytelling (B = -0.07, p = 0.032) than Aβ-negative individuals, although this effect lost significance after correction for multiple testing.
    CONCLUSIONS: Our findings support the feasibility and reliability of multi-day remote assessment of speech acoustics in cognitively unimpaired individuals with and without Aβ-pathology, which lays the foundation for the use of speech biomarkers in the context of early AD.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    一种计算神经肌肉控制系统,可产生肺压和三个内在的喉部肌肉激活(环甲,甲状腺样,和外侧环状突)来控制声源。在目前的研究中,LeTalker,声乐系统的生物物理计算模型被用作物理植物。在LeTalker中,使用三质量声带模型来模拟自持声带振荡。声道形状使用恒定的//元音。在MRI测量后对气管进行建模。神经肌肉控制系统生成控制参数,以实现四个声学目标(基频,声压级,归一化光谱质心,和信噪比)和四个体感目标(声带长度,和三个声带层中的纵向纤维应力)。基于深度学习的控制系统包括一个声学前馈控制器和两个反馈(声学和体感)控制器。使用LeTalker生成了5万个稳定的语音信号,用于训练控制系统。结果表明,控制系统能够产生肺压和三个肌肉激活,从而高精度地达到四个声学和四个体感目标。培训后,与前馈控制器相比,来自反馈控制器的运动指令校正最小,除了甲状腺样肌腱肌肉激活.
    A computational neuromuscular control system that generates lung pressure and three intrinsic laryngeal muscle activations (cricothyroid, thyroarytenoid, and lateral cricoarytenoid) to control the vocal source was developed. In the current study, LeTalker, a biophysical computational model of the vocal system was used as the physical plant. In the LeTalker, a three-mass vocal fold model was used to simulate self-sustained vocal fold oscillation. A constant/ǝ/vowel was used for the vocal tract shape. The trachea was modeled after MRI measurements. The neuromuscular control system generates control parameters to achieve four acoustic targets (fundamental frequency, sound pressure level, normalized spectral centroid, and signal-to-noise ratio) and four somatosensory targets (vocal fold length, and longitudinal fiber stress in the three vocal fold layers). The deep-learning-based control system comprises one acoustic feedforward controller and two feedback (acoustic and somatosensory) controllers. Fifty thousand steady speech signals were generated using the LeTalker for training the control system. The results demonstrated that the control system was able to generate the lung pressure and the three muscle activations such that the four acoustic and four somatosensory targets were reached with high accuracy. After training, the motor command corrections from the feedback controllers were minimal compared to the feedforward controller except for thyroarytenoid muscle activation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究旨在使用倒频谱分析评估颞下颌关节紊乱病(TMD)患者与健康受试者的语音质量,并研究TMD严重程度与倒频谱分析值之间的关系。
    符合纳入标准的受试者完成了一般健康问卷和FonsecaAnamnestic指数。对患有FAI的TMD的患者进行基于颞下颌疾病诊断标准的检查。最终样本包括65名受试者,31名TMDs患者(平均年龄±标准差为36.64±13.67岁),和对照组中的34名健康个体(平均年龄±标准差为30.35±7.78岁)。使用Praat软件计算持续元音和连接语音的倒频谱峰突出度(CPP)和平滑倒频谱峰突出度(CPPS)。
    与对照组相比,TMD患者表现出更低的倒频谱值和更低的语音质量。TMD和对照组的所有倒谱参数均存在显着差异(P<.001),倒谱测量显示与TMD严重程度呈中度至强负相关(P<.001,rho=-0.57至-0.88)。
    本研究的结果表明,倒谱分析可以准确地将TMD患者的语音质量降低与正常语音区分开来。
    UNASSIGNED: This study aimed to assess the voice quality of patients with temporomandibular disorders (TMDs) compared with healthy subjects using cepstral analysis and investigate the relationship between the TMD severity and the values of cepstral analysis.
    UNASSIGNED: Subjects who met the inclusion criteria completed a general health questionnaire and the Fonseca Anamnestic Index. Patients who had TMDs with FAI were subjected to an examination based on the Diagnostic Criteria for Temporomandibular Disorders. The final sample included 65 subjects, 31 TMDs patients (with a mean age ± standard deviation of 36.64 ± 13.67 years), and 34 healthy individuals in the control group (with a mean age ± standard deviation of 30.35 ± 7.78 years). Cepstral Peak Prominence (CPP) and Smoothened Cepstral Peak Prominence (CPPS) of a sustained vowel and connected speech were computed using Praat software.
    UNASSIGNED: TMD patients indicated lower cepstral values and lower voice quality compared to the control group. Significant differences were found between TMD and control groups for all cepstral parameters (P < .001) and cepstral measurements showed a moderate to strong negative correlation with TMD severity (P < .001, rho = -0.57 to -0.88).
    UNASSIGNED: The outcomes of the present study indicate that cepstral analysis can accurately distinguish the reduced voice quality of TMD patients from normal voice.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    锦语经历了历史的音调分裂,导致复杂的色调系统的发展。然而,关于与基于抽吸的音调分离相关联的声学特性的知识仍然有限。这项研究旨在调查与DongleiKam的音调配准和喉部构型有关的声学线索,南锦的方言.十六名以东雷锦为母语的人士参加,产生词汇音调。进行了统计分析,以检查音调寄存器之间的声学区别,使用语音发作时间的测量,频谱倾斜,噪音,和能量。结果表明,东雷Kam保留了误吸的双向对比,尽管有逐渐亏损的趋势。此外,在Ciyin音调寄存器中检测到呼吸的声音,其特征在于整个元音的频谱倾斜值和频谱噪声升高。此外,机器学习分类器使用语音质量数据有效地识别音调寄存器,这表明呼吸和模态语音之间的发声对比可能有助于音调分裂和音调对比。总之,这些发现增强了我们对Kam呼吸的声学实施的理解,并为喉部对比剂在音调分裂中的作用提供了有价值的见解。
    The Kam language has experienced historical tonal splits, resulting in the development of a complex tonal system. However, there is still limited knowledge regarding the acoustic characteristics associated with aspiration-based tone splitting. This study aims to investigate the acoustic cues related to the tonal registers and laryngeal configurations in Donglei Kam, a dialect of Southern Kam. Sixteen native speakers of Donglei Kam participated, producing lexical tones. Statistical analyses were conducted to examine the acoustic distinctions between tonal registers, using measurements of voice onset time, spectral tilt, noise, and energy. The results indicated that Donglei Kam retained a two-way contrast of aspiration, albeit with a trend toward gradual loss. Additionally, a breathy voice was detected in the Ciyin tonal register, characterized by elevated spectral tilt values and spectral noise throughout the vowels. Moreover, machine learning classifiers effectively identified tonal registers using voice-quality data, suggesting that the phonation contrast between breathy and modal voice could contribute to the tonal split alongside pitch contrast. In summary, these findings enhance our understanding of the acoustic implementation of breathiness in Kam and offer valuable insights into the role of laryngeal contrast in tonal splits.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    自闭症患者的言语韵律异常已被广泛报道。许多关于自闭症谱系障碍儿童和成年人说非音调语言的研究表明,使用韵律线索来标记焦点的缺陷。然而,很少检查自闭症儿童说一种音调语言的重点标记。说广东话的孩子可能会面临额外的困难,因为音调语言要求他们使用韵律提示来同时实现多种功能,例如词汇对比和焦点标记。这项研究通过在声学上评估使用粤语语音韵律来标记患有和不患有自闭症谱系障碍的粤语儿童的信息结构,从而弥合了这一研究差距。我们设计了语音制作任务,以在具有不同音调组合的句子中在这些孩子中引起自然的广泛和狭窄的焦点制作。分析了韵律焦点标记的声学相关性,如f0,每个音节的持续时间和强度,以检查参与者组的效果,焦点条件和词汇音调。我们的结果表明,有和没有自闭症谱系障碍的说广东话的儿童之间的焦点标记模式存在差异。自闭症儿童在标记焦点时,不仅在f0范围和持续时间方面表现出焦点扩展不足,但通常也产生不太独特的色调形状。没有证据表明韵律复杂性(即单音或组合的句子)显着影响这些自闭症儿童及其典型发育(TD)同伴的焦点标记。
    Abnormal speech prosody has been widely reported in individuals with autism. Many studies on children and adults with autism spectrum disorder speaking a non-tonal language showed deficits in using prosodic cues to mark focus. However, focus marking by autistic children speaking a tonal language is rarely examined. Cantonese-speaking children may face additional difficulties because tonal languages require them to use prosodic cues to achieve multiple functions simultaneously such as lexical contrasting and focus marking. This study bridges this research gap by acoustically evaluating the use of Cantonese speech prosody to mark information structure by Cantonese-speaking children with and without autism spectrum disorder. We designed speech production tasks to elicit natural broad and narrow focus production among these children in sentences with different tone combinations. Acoustic correlates of prosodic focus marking like f0, duration and intensity of each syllable were analyzed to examine the effect of participant group, focus condition and lexical tones. Our results showed differences in focus marking patterns between Cantonese-speaking children with and without autism spectrum disorder. The autistic children not only showed insufficient on-focus expansion in terms of f0 range and duration when marking focus, but also produced less distinctive tone shapes in general. There was no evidence that the prosodic complexity (i.e. sentences with single tones or combinations of tones) significantly affected focus marking in these autistic children and their typically-developing (TD) peers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    语言韵律在帕金森病(PD)中受影响,这暗示了基底神经节在韵律产生中的作用。然而,最近没有关于PD韵律损害的可用声学证据的系统综合。这项研究旨在确定在PD中始终受到影响的语言韵律的声学特征。
    作者系统地回顾了报道PD中韵律产生的声学特征的文章。文章关注基频(F0)及其变异性,强度及其变异性,语音和发音率,和暂停持续时间和比率。在总共648条记录中,36符合纳入和排除标准。对于每个声学测量和任务,将PD患者(PwPD)的数据与对照组的数据进行比较,以提取效应大小.使用稳健的贝叶斯分层回归模型估计集合效应大小。
    PD与F0变异性降低和暂停持续时间增加相关。PwPD中强度变异性和语速降低的证据有限。没有证据表明PD会影响发音率或停顿率。
    受PD影响的韵律的主要声学参数是F0变异性和停顿持续时间。这些声学参数的识别对于PD管理策略的选择具有重要的临床意义。F0变异性和停顿持续时间与PD的关联表明,控制这些参数的神经回路至少部分共享,并且可能包括基底神经节。虽然当前的研究集中在韵律线索的语音实现上,未来的研究应该检查PD是否以及如何在更高的加工水平下影响韵律。
    https://doi.org/10.23641/asha.25892923。
    UNASSIGNED: Linguistic prosody is affected in Parkinson\'s disease (PD), which implicates the basal ganglia\'s role in the production of prosody. However, there is no recent systematic synthesis of the available acoustic evidence of prosodic impairment in PD. This study aimed to identify the acoustic features of linguistic prosody that are consistently affected in PD.
    UNASSIGNED: The authors systematically reviewed articles that reported acoustic features of prosodic production in PD. Articles focused on fundamental frequency (F0) and its variability, intensity and its variability, speech and articulation rate, and pause duration and ratio. From a total of 648 records identified, 36 met criteria for inclusion and exclusion. For each acoustic measurement and task, data from people with PD (PwPD) were compared with those from controls to extract effect sizes. Pooled effect sizes were estimated using robust Bayesian hierarchical regression models.
    UNASSIGNED: PD was associated with decreased F0 variability and increased pause duration. There was limited evidence of reduced intensity variability and speech rate in PwPD. No evidence was found to suggest that PD affects articulation rate or pause ratio.
    UNASSIGNED: The primary acoustic parameters of prosody affected by PD are F0 variability and pause duration. The identification of these acoustic parameters has important clinical implications for the selection of PD management strategies. The association of F0 variability and pause duration with PD suggests that the neural circuits controlling these parameters are at least partly shared and might include the basal ganglia. While the current study focused on the phonetic realization of prosodic cues, future studies should examine whether and how PD affects prosody at higher levels of processing.
    UNASSIGNED: https://doi.org/10.23641/asha.25892923.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    使用视觉光谱学检查元音鼻化来诊断语音上的音节鼻音辅音的音节归属(例如,gamma),Durvasula和Huang[(2017)。朗Sci.62,17-36]认为,这些单词中的预期元音鼻音化与单词中间密码子形成了模式。使用鼻测量法,目前的研究发现,单形和多形前的预期鼻化(骗子)Ambisyllabic鼻不同于单词-medialcoda(赌博)和单词-final鼻(骗局),但不是来自其他发音的鼻部。此外,元音鼻化对前面音素的方式敏感。这些发现表明,使用鼻测量法量化预期鼻化与视觉光谱学标准不同。
    Using visual spectrographic examination of vowel nasalization to diagnose the syllabic affiliation of phonologically ambisyllabic nasal consonants (e.g., gamma), Durvasula and Huang [(2017). Lang. Sci. 62, 17-36] argued that anticipatory vowel nasalization in these words patterns with word-medial codas. Using nasometry, the current study finds that anticipatory nasalization before monomorphemic and multimorphemic (scammer) ambisyllabic nasals differ from word-medial coda (gamble) and word-final nasals (scam), but not from other intervocalic nasals. Additionally, vowel nasalization is sensitive to the manner of the preceding phoneme. These findings demonstrate that quantifying anticipatory nasalization using nasometry differs from visual spectrographic criteria.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    预先衔接是即将到来的语言信息的高度信息提示:听众可以通过单独听元音来识别单词是本而不是床。本研究比较了人类听众和自我监督的预训练语音模型(wav2vec2.0)在使用鼻关节来对元音进行分类时的相对表现。刺激由60个人产生的鼻化(来自CVN单词)和非鼻化(来自CVC)美国英语元音组成,并以36个TTS声音产生。wav2vec2.0性能类似于人类听众的性能,总的来说。按元音类型分解:wav2vec2.0和听者对人类自然产生的非鼻化元音的表现更高。然而,wav2vec2.0对鼻化元音显示出更高的正确分类性能,而不是非鼻化元音,对于TTS的声音。说话者级别的模式表明,听众对共同发音的使用在说话者之间是高度可变的。wav2vec2.0还显示了性能上的交叉谈话者可变性。分析还揭示了听众和wav2vec2.0在鼻化元音分类中使用多种声学线索的差异。研究结果对于理解如何在言语感知中使用共齿变异具有重要意义。结果还可以深入了解神经系统如何学习共同衔接的独特声学特征。
    Anticipatory coarticulation is a highly informative cue to upcoming linguistic information: listeners can identify that the word is ben and not bed by hearing the vowel alone. The present study compares the relative performances of human listeners and a self-supervised pre-trained speech model (wav2vec 2.0) in the use of nasal coarticulation to classify vowels. Stimuli consisted of nasalized (from CVN words) and non-nasalized (from CVCs) American English vowels produced by 60 humans and generated in 36 TTS voices. wav2vec 2.0 performance is similar to human listener performance, in aggregate. Broken down by vowel type: both wav2vec 2.0 and listeners perform higher for non-nasalized vowels produced naturally by humans. However, wav2vec 2.0 shows higher correct classification performance for nasalized vowels, than for non-nasalized vowels, for TTS voices. Speaker-level patterns reveal that listeners\' use of coarticulation is highly variable across talkers. wav2vec 2.0 also shows cross-talker variability in performance. Analyses also reveal differences in the use of multiple acoustic cues in nasalized vowel classifications across listeners and the wav2vec 2.0. Findings have implications for understanding how coarticulatory variation is used in speech perception. Results also can provide insight into how neural systems learn to attend to the unique acoustic features of coarticulation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    本研究调查了不同类型的语音训练对阿拉伯英语学习者对英语元音的产生和感知的潜在变化的影响。46名阿拉伯语英语学习者被随机分配到三个高变异性元音训练计划之一:感知训练(高变异性语音训练),生产培训,和混合培训计划(生产和感知培训)。测试前和测试后(元音识别,类别歧视,噪声中的语音识别,和元音产生)表明所有训练类型都导致感知和产生的改善。有一些证据表明,改进与训练类型有关:在感知训练条件下的学习者在元音识别方面有所改善,但在元音产生方面没有改善,虽然那些在生产培训条件下的人在感知任务上的表现只有很小的提高,但产量有了更大的改善。然而,训练方式的效果因熟练程度而变得复杂,无论培训模式如何,高熟练程度的学习者都比低熟练程度的学习者从不同类型的培训中受益更多。
    This study investigated the effect of different types of phonetic training on potential changes in the production and perception of English vowels by Arabic learners of English. Forty-six Arabic learners of English were randomly assigned to one of three high variability vowel training programs: Perception training (High Variability Phonetic Training), Production training, and a Hybrid Training program (production and perception training). Pre- and post-tests (vowel identification, category discrimination, speech recognition in noise, and vowel production) showed that all training types led to improvements in perception and production. There was some evidence that improvements were linked to training type: learners in the Perception Training condition improved in vowel identification but not vowel production, while those in the Production Training condition showed only small improvements in performance on perceptual tasks, but greater improvement in production. However, the effects of training modality were complicated by proficiency, with high proficiency learners benefitting more from different types of training regardless of training mode than lower proficiency learners.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    我们如何产生和感知声音受到喉生理学和生物力学的限制。这样的约束可以将其自身呈现为在说话者之间共享的语音结果空间中的主要维度。本研究试图在语音产生的三维计算模型中识别语音结果空间中的此类主要维度以及潜在的喉部控制机制。使用声带几何形状和刚度的参数变化进行了大规模语音模拟,声门间隙,声道形状,声门下压.主成分分析应用于结合生理控制参数和语音结果测量的数据。结果表明,三个主要维度至少占总方差的50%。前两个维度描述了呼吸-喉部协调在控制产生的声音中低频和高频谐波之间的能量平衡。第三个维度描述了基频的控制。这三个维度的优势表明,沿着这些主要维度的语音变化可能比其他语音变化更一致地产生和被大多数说话者感知,因此更有可能在进化过程中出现并被用来传达重要的个人信息,如情绪和喉的大小。
    How we produce and perceive voice is constrained by laryngeal physiology and biomechanics. Such constraints may present themselves as principal dimensions in the voice outcome space that are shared among speakers. This study attempts to identify such principal dimensions in the voice outcome space and the underlying laryngeal control mechanisms in a three-dimensional computational model of voice production. A large-scale voice simulation was performed with parametric variations in vocal fold geometry and stiffness, glottal gap, vocal tract shape, and subglottal pressure. Principal component analysis was applied to data combining both the physiological control parameters and voice outcome measures. The results showed three dominant dimensions accounting for at least 50% of the total variance. The first two dimensions describe respiratory-laryngeal coordination in controlling the energy balance between low- and high-frequency harmonics in the produced voice, and the third dimension describes control of the fundamental frequency. The dominance of these three dimensions suggests that voice changes along these principal dimensions are likely to be more consistently produced and perceived by most speakers than other voice changes, and thus are more likely to have emerged during evolution and be used to convey important personal information, such as emotion and larynx size.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号