UNASSIGNED: Machine learning models were trained and evaluated on interval-based and event-based stuttered speech corpora. The models used acoustic and linguistic features extracted from the speech signal and the transcriptions generated by a state-of-the-art automatic speech recognition system.
UNASSIGNED: The results showed that event-based segmentation led to better ARS performance than interval-based segmentation, as measured by the area under the curve (AUC) of the receiver operating characteristic. The results suggest differences in the quality and quantity of the data because of segmentation method. The inclusion of linguistic features improved the detection of whole-word repetitions, but not other types of stutters.
UNASSIGNED: The findings suggest that event-based segmentation is more suitable for ARS than interval-based segmentation, as it preserves the exact boundaries and types of stutters. The linguistic features provide useful information for separating supra-lexical disfluencies from fluent speech but may not capture the acoustic characteristics of stutters. Future work should explore more robust and diverse features, as well as larger and more representative datasets, for developing effective ARS systems.
■机器学习模型在基于间隔和基于事件的口吃语音语料库上进行了训练和评估。模型使用从语音信号中提取的声学和语言特征以及由最先进的自动语音识别系统生成的转录。
■结果表明,基于事件的分割比基于间隔的分割具有更好的ARS性能,如通过接收器操作特性的曲线下面积(AUC)所测量的。结果表明,由于分割方法的不同,数据的质量和数量存在差异。包含语言特征改善了对整个单词重复的检测,但不是其他类型的口吃。
■研究结果表明,基于事件的分割比基于间隔的分割更适合ARS,因为它保留了口吃的确切边界和类型。语言特征提供了有用的信息,可将超词汇不流与流利的语音分开,但可能无法捕获口吃的声学特征。未来的工作应该探索更强大和多样化的功能,以及更大、更具代表性的数据集,开发有效的ARS系统。