Mesh : Humans Black or African American Adult Child Speech Acoustics Male Female Speech Production Measurement / methods Language Child, Preschool Young Adult Speech Perception Adolescent Phonetics Child Language

来  源:   DOI:10.1121/10.0025771   PDF(Pubmed)

Abstract:
This paper evaluates an innovative framework for spoken dialect density prediction on children\'s and adults\' African American English. A speaker\'s dialect density is defined as the frequency with which dialect-specific language characteristics occur in their speech. Rather than treating the presence or absence of a target dialect in a user\'s speech as a binary decision, instead, a classifier is trained to predict the level of dialect density to provide a higher degree of specificity in downstream tasks. For this, self-supervised learning representations from HuBERT, handcrafted grammar-based features extracted from ASR transcripts, prosodic features, and other feature sets are experimented with as the input to an XGBoost classifier. Then, the classifier is trained to assign dialect density labels to short recorded utterances. High dialect density level classification accuracy is achieved for child and adult speech and demonstrated robust performance across age and regional varieties of dialect. Additionally, this work is used as a basis for analyzing which acoustic and grammatical cues affect machine perception of dialect.
摘要:
本文评估了一种用于儿童和成人非裔美国人英语口语方言密度预测的创新框架。说话者的方言密度定义为其语音中出现方言特定语言特征的频率。而不是将用户语音中是否存在目标方言视为二元决策,相反,训练分类器来预测方言密度的水平,以在下游任务中提供更高的特异性。为此,来自HuBERT的自监督学习表示,从ASR转录本中提取的手工制作的基于语法的特征,韵律特征,和其他特征集作为XGBoost分类器的输入进行试验。然后,分类器被训练为为短记录的话语分配方言密度标签。对于儿童和成人语音,可以实现较高的方言密度级别分类精度,并在不同年龄和地区方言品种中表现出稳健的表现。此外,这项工作被用作分析哪些声学和语法线索影响方言的机器感知的基础。
公众号