Mesh : Humans Wearable Electronic Devices Speech Language Lip / physiology Movement Male Female Adult Lipreading Motion Capture

来  源:   DOI:10.1126/sciadv.ado9576   PDF(Pubmed)

Abstract:
Lip language recognition urgently needs wearable and easy-to-use interfaces for interference-free and high-fidelity lip-reading acquisition and to develop accompanying data-efficient decoder-modeling methods. Existing solutions suffer from unreliable lip reading, are data hungry, and exhibit poor generalization. Here, we propose a wearable lip language decoding technology that enables interference-free and high-fidelity acquisition of lip movements and data-efficient recognition of fluent lip language based on wearable motion capture and continuous lip speech movement reconstruction. The method allows us to artificially generate any wanted continuous speech datasets from a very limited corpus of word samples from users. By using these artificial datasets to train the decoder, we achieve an average accuracy of 92.0% across individuals (n = 7) for actual continuous and fluent lip speech recognition for 93 English sentences, even observing no training burn on users because all training datasets are artificially generated. Our method greatly minimizes users\' training/learning load and presents a data-efficient and easy-to-use paradigm for lip language recognition.
摘要:
唇语识别迫切需要可穿戴且易于使用的接口,以实现无干扰和高保真的唇读采集,并开发伴随的数据高效解码器建模方法。现有的解决方案遭受不可靠的唇读,渴望数据,并表现出较差的概括性。这里,我们提出了一种可穿戴式唇语解码技术,该技术基于可穿戴式动作捕捉和连续的嘴唇语音运动重建,实现了嘴唇运动的无干扰和高保真采集以及流利唇语的数据高效识别。该方法允许我们从非常有限的用户单词样本语料库中人工生成任何想要的连续语音数据集。通过使用这些人工数据集来训练解码器,对于93个英语句子的实际连续和流畅的唇语语音识别,我们在个体(n=7)中的平均准确率为92.0%,甚至没有观察到用户的训练烧伤,因为所有的训练数据集都是人工生成的。我们的方法极大地减少了用户的训练/学习负荷,并为唇语识别提供了一种数据高效且易于使用的范例。
公众号