关键词: American sign language deep learning models wearable inertial sensors

Mesh : Humans United States Sign Language Motion Capture Neurons Perception Wearable Electronic Devices

来  源:   DOI:10.3390/s24020453   PDF(Pubmed)

Abstract:
Sign language is designed as a natural communication method to convey messages among the deaf community. In the study of sign language recognition through wearable sensors, the data sources are limited, and the data acquisition process is complex. This research aims to collect an American sign language dataset with a wearable inertial motion capture system and realize the recognition and end-to-end translation of sign language sentences with deep learning models. In this work, a dataset consisting of 300 commonly used sentences is gathered from 3 volunteers. In the design of the recognition network, the model mainly consists of three layers: convolutional neural network, bi-directional long short-term memory, and connectionist temporal classification. The model achieves accuracy rates of 99.07% in word-level evaluation and 97.34% in sentence-level evaluation. In the design of the translation network, the encoder-decoder structured model is mainly based on long short-term memory with global attention. The word error rate of end-to-end translation is 16.63%. The proposed method has the potential to recognize more sign language sentences with reliable inertial data from the device.
摘要:
手语被设计为在聋人社区之间传达信息的自然交流方法。在通过可穿戴传感器进行手语识别的研究中,数据源有限,数据采集过程复杂。本研究旨在通过可穿戴惯性运动捕捉系统收集美国手语数据集,并利用深度学习模型实现手语句子的识别和端到端翻译。在这项工作中,由300个常用句子组成的数据集来自3名志愿者。在识别网络的设计中,该模型主要由三层组成:卷积神经网络,双向长短期记忆,和连接主义时间分类。该模型在单词级评估中的准确率为99.07%,在句子级评估中的准确率为97.34%。在翻译网络的设计中,编码器-解码器结构化模型主要基于具有全球关注的长短期记忆。端到端翻译的单词错误率为16.63%。所提出的方法具有利用来自设备的可靠惯性数据来识别更多手语句子的潜力。
公众号