Phonetics

语音学
  • 文章类型: Journal Article
    给定正字法转录,强制对准系统自动确定语音片段之间的边界,促进大型语料库的使用。在本论文中,我们介绍了一种基于神经网络的强制对准系统,梅森-艾伯塔省语音分段器(MAPS)。MAPS是我们为强制对准系统寻求的两种可能改进的测试平台。首先是将声学模型视为标记器,而不是分类器,出于共同的理解,即细分市场并不是真正离散的,而且往往是重叠的。第二种是插值技术,可以比现代系统中的典型10ms限制更精确的边界。在测试过程中,我们训练的所有系统配置在10ms边界放置容差阈值中显著优于最先进的蒙特利尔强制对准器。实现的最大差异是28.13%的相对性能提高。蒙特利尔强制校准器在大约30ms的公差下开始略微优于我们的模型。我们还反思了强制对准声学建模的训练过程,强调这些模型的输出目标如何与电话之间的相似性概念不匹配,调和这种紧张关系可能需要重新思考任务和输出目标,或者语音本身应该如何分段。
    Given an orthographic transcription, forced alignment systems automatically determine boundaries between segments in speech, facilitating the use of large corpora. In the present paper, we introduce a neural network-based forced alignment system, the Mason-Alberta Phonetic Segmenter (MAPS). MAPS serves as a testbed for two possible improvements we pursue for forced alignment systems. The first is treating the acoustic model as a tagger, rather than a classifier, motivated by the common understanding that segments are not truly discrete and often overlap. The second is an interpolation technique to allow more precise boundaries than the typical 10 ms limit in modern systems. During testing, all system configurations we trained significantly outperformed the state-of-the-art Montreal Forced Aligner in the 10 ms boundary placement tolerance threshold. The greatest difference achieved was a 28.13 % relative performance increase. The Montreal Forced Aligner began to slightly outperform our models at around a 30 ms tolerance. We also reflect on the training process for acoustic modeling in forced alignment, highlighting how the output targets for these models do not match phoneticians\' conception of similarity between phones and that reconciling this tension may require rethinking the task and output targets or how speech itself should be segmented.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    舌头在发音diadochokinesis和其他音节中的基本功能尚未完全理解。这项研究调查了19名健康成年人(平均年龄:28.2岁;范围:22-33岁)的声压级和音节对舌头压力和肌肉活动的影响。当velar停止/ka/时,使用肌电图(EMG)测量舌压和后舌的活动,/ko/,/ga/,和/go/在70、60、50和40dB处明显。斯皮尔曼的等级相关性显示出显著的,然而虚弱,舌压与肌电图活动呈正相关(ρ=0.14,p<0.05)。混合效应模型分析表明,与其他声压级相比,舌压和EMG活动在70dB处显着增加。虽然音节并没有显着影响舌头的压力,音节/ko/显着增加EMG活动(系数=0.048,p=0.013)。尽管在velarstop/ka/时没有观察到舌头压力的显着差异,/ko/,/ga/,和/去/,建议通过改变外在和内在舌肌的活动来实现关节连接。这些发现强调了在检查语音过程中影响声压级的生理因素时,同时考虑舌头压力和肌肉活动的重要性。
    The basic function of the tongue in pronouncing diadochokinesis and other syllables is not fully understood. This study investigates the influence of sound pressure levels and syllables on tongue pressure and muscle activity in 19 healthy adults (mean age: 28.2 years; range: 22-33 years). Tongue pressure and activity of the posterior tongue were measured using electromyography (EMG) when the velar stops /ka/, /ko/, /ga/, and /go/ were pronounced at 70, 60, 50, and 40 dB. Spearman\'s rank correlation revealed a significant, yet weak, positive association between tongue pressure and EMG activity (ρ = 0.14, p < 0.05). Mixed-effects model analysis showed that tongue pressure and EMG activity significantly increased at 70 dB compared to other sound pressure levels. While syllables did not significantly affect tongue pressure, the syllable /ko/ significantly increased EMG activity (coefficient = 0.048, p = 0.013). Although no significant differences in tongue pressure were observed for the velar stops /ka/, /ko/, /ga/, and /go/, it is suggested that articulation is achieved by altering the activity of both extrinsic and intrinsic tongue muscles. These findings highlight the importance of considering both tongue pressure and muscle activity when examining the physiological factors contributing to sound pressure levels during speech.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项研究调查了日语中的下步是否是由口音直接触发的。当重音词(A)后的单词X的音高低于无重音词(U)后的音高时,X被诊断为下降。然而,这种诊断涉及两个混杂因素:X之前已经降低的F0和语音措辞。为了控制这些因素,这项研究对比了基因和名词性大小写标记,并调整了测量点。八名以东京为母语的日本人参加了生产实验。结果显示了六个关键发现。首先,在UX中观察到结构依赖的F0下降趋势。第二,较高的F0峰值与更大的初始降低被观察到后主音的情况下,与那些与一个通用的情况标记相比,暗示了边界的增强作用。第三,与UX相比,AX的初始下降幅度更大,与X由于下降而在AX中压缩得更多的观点相矛盾。第四,当X的F0增加时,AX和UX之间的F0高度的范式差异减小,支持边界触发下降。第五,下步不受生理限制,而是受语音控制。最后,阻碍重音节的初始降低不是语音,而是一种发音现象。
    This study investigates whether downstep in Japanese is directly triggered by accents. When the pitch height of a word X is lower after an accented word (A) than after an unaccented word (U), X is diagnosed as downstepped. However, this diagnosis involves two confounding factors: the already lowered F0 before X and phonological phrasing. To control these factors, this study contrasts genitive and nominative case markers and adjusts measurement points. Eight native speakers of Tokyo Japanese participated in a production experiment. The results show six key findings. First, a structure-dependent F0 downtrend was observed in UX. Second, higher F0 peaks with larger initial lowering were observed after accents with a nominative case marker compared to those with a genitive case marker, suggesting a boosting effect by boundaries. Third, larger initial lowering was observed in AX compared to UX, contradicting the notion that X is more compressed in AX due to downstep. Fourth, the paradigmatic difference in F0 height between AX and UX decreases when F0 of X is increased, supporting that boundaries trigger downstep. Fifth, downstep is not physiologically constrained but is phonologically controlled. Finally, the blocking of initial lowering in heavy syllables is not phonological but rather an articulatory phenomenon.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    使用三种声学度量(最小和最大强度速度和持续时间)和两种递归神经网络(Phonet)度量(声音和连续语音的后验概率)评估了阿根廷西班牙语语料库中无声和有声停止的渐变程度的预测特征)。虽然在声学指标中获得了混合和不一致的预测,声音和连续的概率值一致地在已知因素预测的方向停止llenition相对于其发声,衔接的地方,和周围环境。结果表明,Phonet作为一种额外或替代的宽容测量方法的有效性。此外,本研究通过发布本研究中使用的经过训练的西班牙语电话模型以及包含训练和推断新模型的分步说明的管道,增强了电话的可及性.
    Predictions of gradient degree of lenition of voiceless and voiced stops in a corpus of Argentine Spanish are evaluated using three acoustic measures (minimum and maximum intensity velocity and duration) and two recurrent neural network (Phonet) measures (posterior probabilities of sonorant and continuant phonological features). While mixed and inconsistent predictions were obtained across the acoustic metrics, sonorant and continuant probability values were consistently in the direction predicted by known factors of a stop\'s lenition with respect to its voicing, place of articulation, and surrounding contexts. The results suggest the effectiveness of Phonet as an additional or alternative method of lenition measurement. Furthermore, this study has enhanced the accessibility of Phonet by releasing the trained Spanish Phonet model used in this study and a pipeline with step-by-step instructions for training and inferencing new models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    高元音的f0比低元音高,创建对f0的解释的上下文效果。由于开始F0是停止发声的提示,元音语境预计会影响发声判断。听众对音节进行分类,从高(“蜜蜂”-“豌豆”)和低(“再见”-“馅饼”)元音在VOT和起始F0中正交变化。听众按预期使用了这两个提示。此外,元音高度受影响的侦听器\'分类。与高元音/i/的音节相比,低元音/a/的音节引起的无声反应更多。这表明听众在进行其他音素判断时可以补偿元音的内在影响。
    High vowels have higher f0 than low vowels, creating a context effect on the interpretation of f0. Since onset F0 is a cue to stop voicing, the vowel context is expected to influence voicing judgements. Listeners categorized syllables starting with high (\"bee\"-\"pea\") and low (\"bye\"-\"pie\") vowels varying orthogonally in VOT and onset F0. Listeners made use of both cues as expected. Furthermore, vowel height affected listeners\' categorization. Syllables with the low vowel /a/ elicited more voiceless responses compared to syllables with the high vowel /i/. This suggests that listeners compensate for vowel intrinsic effects when making other phonemic judgements.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    越来越多的法医语音比较研究探索了如何将语音分析和自动说话人识别系统的元素集成在一起,以实现最佳的说话人辨别性能。然而,很少有研究使用取证相关的语音数据调查长期语音特征的证据价值。本文报告了一项经验验证研究,该研究评估了以下长期特征的证据强度:基频(F0),共振峰分布,喉音质量,梅尔频率倒谱系数(MFCC),及其组合。分析了75名澳大利亚男性英语使用者的语音风格不匹配的非同期录音。结果表明,1)MFCC优于长期声学语音特征;2)源和滤波器特征不能提供相当多的互补说话者特定信息;3)将长期语音特征添加到基于MFCC的系统中不会导致系统性能的有意义的改善。讨论了语音分析和自动说话人识别系统的互补性的含义。
    A growing number of studies in forensic voice comparison have explored how elements of phonetic analysis and automatic speaker recognition systems may be integrated for optimal speaker discrimination performance. However, few studies have investigated the evidential value of long-term speech features using forensically-relevant speech data. This paper reports an empirical validation study that assesses the evidential strength of the following long-term features: fundamental frequency (F0), formant distributions, laryngeal voice quality, mel-frequency cepstral coefficients (MFCCs), and combinations thereof. Non-contemporaneous recordings with speech style mismatch from 75 male Australian English speakers were analyzed. Results show that 1) MFCCs outperform long-term acoustic phonetic features; 2) source and filter features do not provide considerably complementary speaker-specific information; and 3) the addition of long-term phonetic features to an MFCCs-based system does not lead to meaningful improvement in system performance. Implications for the complementarity of phonetic analysis and automatic speaker recognition systems are discussed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    虽然许多研究集中在帕金森语音的分段变化,关于反映帕金森病(PwPD)患者适应交际需求的能力的韵律调制知之甚少。这种韵律调制对于社会互动很重要,它涉及语音旋律(语调水平)和辅音和元音(分段水平)的发音。本研究调查了轻度构音障碍PwPD中不同焦点结构的韵律调制的语音线索,作为左旋多巴的函数。在两种运动条件下评估了25PwPD的声学和运动学语音参数。通过3-D电磁关节描记术在左旋多巴摄入之前(药物关闭)和之后(药物打开)收集来自PwPD的语音产生数据。在声学层面上,强度,螺距,和音节持续时间进行了分析。在运动学层面上,运动持续时间和振幅进行了调查。在三种不同的韵律聚焦结构(失焦,广泛关注,对比焦点)以显示不同的语音需求。总的来说,左旋多巴对电机性能有有益的影响,语音响度,和间距调制。声学音节持续时间和运动学运动持续时间没有变化,没有揭示运动状态对时域的系统影响。相比之下,在左旋多巴下,口腔关节有空间调制:舌尖运动较小,下唇运动幅度较大,反映了左旋多巴下更敏捷和有效的关节运动。因此,呼吸发声功能和辅音产生得到改善,而音节持续时间和舌体运动学没有改变。有趣的是,突出标记策略在调查的药物条件之间具有可比性,事实上,似乎保存在轻度构音障碍的PwPD中。
    While many studies focus on segmental variation in Parkinsonian speech, little is known about prosodic modulations reflecting the ability to adapt to communicative demands in people with Parkinson\'s disease (PwPD). This type of prosodic modulation is important for social interaction, and it involves modifications in speech melody (intonational level) and articulation of consonants and vowels (segmental level). The present study investigates phonetic cues of prosodic modulations with respect to different focus structures in mild dysarthric PwPD as a function of levodopa. Acoustic and kinematic speech parameters of 25 PwPD were assessed in two motor conditions. Speech production data from PwPD were collected before (medication-OFF) and after levodopa intake (medication-ON) by means of 3-D electromagnetic articulography. On the acoustic level, intensity, pitch, and syllable durations were analyzed. On the kinematic level, movement duration and amplitude were investigated. Spatio-temporal modulations of speech parameters were examined and compared across three different prosodic focus structures (out-of-focus, broad focus, contrastive focus) to display varying speech demands. Overall, levodopa had beneficial effects on motor performance, speech loudness, and pitch modulation. Acoustic syllable durations and kinematic movement durations did not change, revealing no systematic effects of motor status on the temporal domain. In contrast, there were spatial modulations of the oral articulators: tongue tip movements were smaller and lower lip movements were larger in amplitude under levodopa, reflecting a more agile and efficient articulatory movement under levodopa. Thus, respiratory-phonatory functions and consonant production improved, while syllable duration and tongue body kinematics did not change. Interestingly, prominence marking strategies were comparable between the medication conditions under investigation, and in fact, appear to be preserved in mild dysarthric PwPD.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在这项研究中,电脑驱动的,探索了音素-不可知论方法来评估儿童言语障碍(SD),绕过传统的劳动密集型语音转录。使用SpeechMark®自动音节聚类(SC)分析,它检测表征格式良好的音节的声学特征序列,1952年,从两个方言区域分析了60名学龄前儿童的美式英语话语[16名存在言语障碍(SD-P)和44名不存在言语障碍(SD-NP)]。四因素回归分析评估了SpeechMark®产生的七个自动化测量的稳健性及其相互作用。SC显著预测SD状态(p<0.001)。使用具有负二项分布的广义线性模型的二次分析评估了各组产生的SC的数量。结果强调,SD-P儿童产生较少的形成良好的集群[发生率比率(IRR)=0.8116,p≤0.0137]。语音组和年龄之间的相互作用表明,年龄对音节计数的影响在SD-P儿童中更为明显(IRR=1.0451,p=0.0251),这表明即使年龄的微小变化也会对SCs产生显著影响.总之,言语状态显着影响学龄前儿童在声学上形成良好的SC的程度,提示SC可能成为学龄前儿童SD的言语生物标志物。
    In this study, a computer-driven, phoneme-agnostic method was explored for assessing speech disorders (SDs) in children, bypassing traditional labor-intensive phonetic transcription. Using the SpeechMark® automatic syllabic cluster (SC) analysis, which detects sequences of acoustic features that characterize well-formed syllables, 1952 American English utterances of 60 preschoolers were analyzed [16 with speech disorder present (SD-P) and 44 with speech disorder not present (SD-NP)] from two dialectal areas. A four-factor regression analysis evaluated the robustness of seven automated measures produced by SpeechMark® and their interactions. SCs significantly predicted SD status (p < 0.001). A secondary analysis using a generalized linear model with a negative binomial distribution evaluated the number of SCs produced by the groups. Results highlighted that children with SD-P produced fewer well-formed clusters [incidence rate ratio (IRR) = 0.8116, p ≤ 0.0137]. The interaction between speech group and age indicated that the effect of age on syllable count was more pronounced in children with SD-P (IRR = 1.0451, p = 0.0251), suggesting that even small changes in age can have a significant effect on SCs. In conclusion, speech status significantly influences the degree to which preschool children produce acoustically well-formed SCs, suggesting the potential for SCs to be speech biomarkers for SD in preschoolers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    语音中的声学信息不断变化,然而,听众形成离散的感知类别,以缓解感知的需求。与更离散/分类的收听者相反,更连续/梯度对于通过增加感知灵活性和解决模糊性来理解噪声中的语音可以是进一步有利的。在语音标记任务期间,可以使用视觉模拟缩放(VAS)对收听者对连续语音的响应进行分类与连续的程度进行量化。这里,我们沿声-音连续体(/u/to/a/)记录了与事件相关的脑电位(ERP)与元音的关系,而听者在清洁和噪音条件下对音素进行了分类.使用标准的两种替代强制选择(2AFC)和VAS范例评估行为,以评估任务结构下的分类,从而促进离散与离散的连续听力,分别。行为上,在2AFC下,识别曲线更陡峭。VAS分类,但相对不受噪声影响,建议对抽象的强大访问,即使在信号退化的情况下,语音类别也是如此。行为斜率与听众QuickSIN分数相关;较浅的斜率对应于较好的噪声表现,暗示了更梯度的听力策略赋予的噪声降低的语音理解的感知优势。在神经层面,ERP的P2幅度和延迟由任务和噪声调制;与2AFC响应相比,VAS响应更大,并且显示出与噪声相关的延迟延迟更大。更多的梯度响应者在有噪声的ERP延迟中有较小的变化,这表明他们的语音神经编码对噪声退化更有弹性。有趣的是,源解析的ERP显示,更多的梯度聆听也与左颞上回的更强的神经反应相关。我们的结果表明,听力策略调节语音和行为成功的分类组织,更连续/梯度的聆听有利于噪声感知中的句子语音。
    Acoustic information in speech changes continuously, yet listeners form discrete perceptual categories to ease the demands of perception. Being a more continuous/gradient as opposed to a more discrete/categorical listener may be further advantageous for understanding speech in noise by increasing perceptual flexibility and resolving ambiguity. The degree to which a listener\'s responses to a continuum of speech sounds are categorical versus continuous can be quantified using visual analog scaling (VAS) during speech labeling tasks. Here, we recorded event-related brain potentials (ERPs) to vowels along an acoustic-phonetic continuum (/u/ to /a/) while listeners categorized phonemes in both clean and noise conditions. Behavior was assessed using standard two alternative forced choice (2AFC) and VAS paradigms to evaluate categorization under task structures that promote discrete vs. continuous hearing, respectively. Behaviorally, identification curves were steeper under 2AFC vs. VAS categorization but were relatively immune to noise, suggesting robust access to abstract, phonetic categories even under signal degradation. Behavioral slopes were correlated with listeners\' QuickSIN scores; shallower slopes corresponded with better speech in noise performance, suggesting a perceptual advantage to noise degraded speech comprehension conferred by a more gradient listening strategy. At the neural level, P2 amplitudes and latencies of the ERPs were modulated by task and noise; VAS responses were larger and showed greater noise-related latency delays than 2AFC responses. More gradient responders had smaller shifts in ERP latency with noise, suggesting their neural encoding of speech was more resilient to noise degradation. Interestingly, source-resolved ERPs showed that more gradient listening was also correlated with stronger neural responses in left superior temporal gyrus. Our results demonstrate that listening strategy modulates the categorical organization of speech and behavioral success, with more continuous/gradient listening being advantageous to sentential speech in noise perception.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    无元音词在类型学上非常罕见,尽管它们存在于某些语言中,例如Tashlhiyt(例如,fkt\'给它\')。当前的研究测试了成年英语使用者从短暂的听觉暴露中获得包含三段(CCC)无元音单词的词典是否比不包含无元音单词的词典更困难。还探索了声学语音形式在学习这些类型学上罕见的单词形式中的作用:在实验1中,对参与者进行了仅在清晰语音或随意语音中产生的单词的培训;实验2对参与者进行了两种语音风格产生的词汇项目的培训。当说话风格与参与者一致时,听众能够很好地学习无元音和元音词典,但是当训练由可变的声学语音形式组成时,无元音词典的学习较低。在这两个实验中,对包含新颖项目的训练后单词相似度评级任务的响应表明,接触无元音词典会导致参与者接受新的无元音单词作为可接受的词汇形式。这些结果表明,在类型上最稀有的词汇形式之一-没有元音的单词-可以被幼稚的成年听众迅速获得。然而,声学语音变异调节学习。
    Vowelless words are exceptionally typologically rare, though they are found in some languages, such as Tashlhiyt (e.g., fkt \'give it\'). The current study tests whether lexicons containing tri-segmental (CCC) vowelless words are more difficult to acquire than lexicons not containing vowelless words by adult English speakers from brief auditory exposure. The role of acoustic-phonetic form on learning these typologically rare word forms is also explored: In Experiment 1, participants were trained on words produced in either only Clear speech or Casual speech productions of words; Experiment 2 trained participants on lexical items produced in both speech styles. Listeners were able to learn both vowelless and voweled lexicons equally well when speaking style was consistent for participants, but learning was lower for vowelless lexicons when training consisted of variable acoustic-phonetic forms. In both experiments, responses to a post-training wordlikeness ratings task containing novel items revealed that exposure to a vowelless lexicon leads participants to accept new vowelless words as acceptable lexical forms. These results demonstrate that one of the typologically rarest types of lexical forms - words without vowels - can be rapidly acquired by naive adult listeners. Yet, acoustic-phonetic variation modulates learning.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号