Sign language recognition

  • 文章类型: Journal Article
    连续手语识别(CSLR)是将手语视频转换为光泽序列的任务。现有的基于深度学习的手语识别方法通常依赖于大规模的训练数据和丰富的监督信息。然而,当前手语数据集有限,它们仅在句子级别而不是框架级别进行注释。对手语数据的监管不足对手语识别提出了严峻的挑战,这可能导致手语识别模型训练不足。为了解决上述问题,我们提出了一种用于连续手语识别的跨模态知识蒸馏方法,其中包含两个教师模型和一个学生模型。教师模型之一是Sign2Text对话教师模型,输入手语视频和对话句,输出手语识别结果。另一种教师模式是Text2Gloss翻译教师模式,其目标是将文本句子翻译成光泽序列。两种教师模式都可以提供信息丰富的软标签来辅助学生模式的训练,这是一个通用的手语识别模型。我们对多个常用的手语数据集进行了广泛的实验,即,凤凰2014T,CSL-Daily和QSL,结果表明,所提出的跨模态知识蒸馏方法通过将多模态信息从教师模型传递到学生模型,能够有效提高手语识别的准确率。代码可在https://github.com/glq-1992/cross-modal-knowledge-restrination_new上找到。
    Continuous Sign Language Recognition (CSLR) is a task which converts a sign language video into a gloss sequence. The existing deep learning based sign language recognition methods usually rely on large-scale training data and rich supervised information. However, current sign language datasets are limited, and they are only annotated at sentence-level rather than frame-level. Inadequate supervision of sign language data poses a serious challenge for sign language recognition, which may result in insufficient training of sign language recognition models. To address above problems, we propose a cross-modal knowledge distillation method for continuous sign language recognition, which contains two teacher models and one student model. One of the teacher models is the Sign2Text dialogue teacher model, which takes a sign language video and a dialogue sentence as input and outputs the sign language recognition result. The other teacher model is the Text2Gloss translation teacher model, which targets to translate a text sentence into a gloss sequence. Both teacher models can provide information-rich soft labels to assist the training of the student model, which is a general sign language recognition model. We conduct extensive experiments on multiple commonly used sign language datasets, i.e., PHOENIX 2014T, CSL-Daily and QSL, the results show that the proposed cross-modal knowledge distillation method can effectively improve the sign language recognition accuracy by transferring multi-modal information from teacher models to the student model. Code is available at https://github.com/glq-1992/cross-modal-knowledge-distillation_new.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    柔性应变传感器在智能可穿戴设备等领域得到了广泛的研究,人类健康监测,和生物医学应用。然而,同时实现柔性应变传感器的宽传感范围和高灵敏度仍然是一个挑战,限制其进一步应用。为了解决这些问题,提出了一种跨尺度组合仿生分层设计,其特征是微尺度形态与宏观尺度基础相结合,以平衡传感范围和灵敏度。受蛇形和蝴蝶翅膀结构组合的启发,这项研究采用了三维打印,预拉伸,和模具转移过程,以构造具有蛇形倒V形槽/起皱开裂结构的组合仿生分层柔性应变传感器(CBH传感器)。CBH传感器具有150%的高宽传感范围和高灵敏度,仪表系数高达2416.67。此外,它展示了CBH传感器阵列在手语手势识别中的应用,在机器学习的帮助下,成功识别九种不同的手语手势,准确率达到了令人印象深刻的100%。CBH传感器在使用手语的个人和不使用手语的个人之间实现无障碍通信方面表现出相当大的前景。此外,它在人机界面中的手势驱动交互领域中具有广泛的使用可能性。
    Flexible strain sensors have been widely researched in fields such as smart wearables, human health monitoring, and biomedical applications. However, achieving a wide sensing range and high sensitivity of flexible strain sensors simultaneously remains a challenge, limiting their further applications. To address these issues, a cross-scale combinatorial bionic hierarchical design featuring microscale morphology combined with a macroscale base to balance the sensing range and sensitivity is presented. Inspired by the combination of serpentine and butterfly wing structures, this study employs three-dimensional printing, prestretching, and mold transfer processes to construct a combinatorial bionic hierarchical flexible strain sensor (CBH-sensor) with serpentine-shaped inverted-V-groove/wrinkling-cracking structures. The CBH-sensor has a high wide sensing range of 150% and high sensitivity with a gauge factor of up to 2416.67. In addition, it demonstrates the application of the CBH-sensor array in sign language gesture recognition, successfully identifying nine different sign language gestures with an impressive accuracy of 100% with the assistance of machine learning. The CBH-sensor exhibits considerable promise for use in enabling unobstructed communication between individuals who use sign language and those who do not. Furthermore, it has wide-ranging possibilities for use in the field of gesture-driven interactions in human-computer interfaces.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    近年来,晶体管的突触特性已被广泛研究。与基于液体或有机材料的晶体管相比,无机固体电解质栅极晶体管具有化学稳定性好的优点。这项研究使用了一个简单的,AlLiO固体电解质制备In2O3晶体管的低成本溶液技术。该器件的电化学性能是通过形成双电层和电化学掺杂来实现的,可以模仿生物突触的基本功能,如兴奋性突触后电流(EPSC),成对脉冲促进(PPF),和尖峰时间依赖性可塑性(STDP)。此外,成功模拟了复杂的突触行为,例如巴甫洛夫经典条件和摩尔斯电码“青岛”。识别准确率达95%,建立了基于晶体管的人工神经网络来识别手语并实现手语解释。此外,手写数字的识别准确率为94%。即使有各种级别的高斯噪声,识别率仍在84%以上。上述发现证明了In2O3/AlLiOTFT在塑造下一代人工智能方面的潜力。 .
    In recent years, the synaptic properties of transistors have been extensively studied. Compared with liquid or organic material-based transistors, inorganic solid electrolyte-gated transistors have the advantage of better chemical stability. This study uses a simple, low-cost solution technology to prepare In2O3transistors gated by AlLiO solid electrolyte. The electrochemical performance of the device is achieved by forming a double electric layer and electrochemical doping, which can mimic basic functions of biological synapses, such as excitatory postsynaptic current, paired-pulse promotion, and spiking time-dependent plasticity. Furthermore, complex synaptic behaviors such as Pavlovian classical conditioning is successfully emulated. With a 95% identification accuracy, an artificial neural network based on transistors is built to recognize sign language and enable sign language interpretation. Additionally, the handwriting digit\'s identification accuracy is 94%. Even with various levels of Gaussian noise, the recognition rate is still above 84%. The above findings demonstrate the potential of In2O3/AlLiO TFT in shaping the next generation of artificial intelligence.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    寻找实现聋人和健全人之间无缝交流的方法一直是一个具有挑战性和紧迫的问题。本文通过设计一种利用多个惯性传感器的低成本数据手套,提出了解决这一问题的方法,目的是实现高效准确的手语识别。在这项研究中,四种机器学习模型-决策树(DT),支持向量机(SVM),K最近邻方法(KNN),和随机森林(RF)-用于识别聋人使用的20种不同类型的动态手语数据。此外,在此过程中利用了基于注意力的长时和短时记忆神经网络(Attention-BiLSTM)机制。此外,本研究验证了数据手套节点的数量和位置对复杂动态手语识别准确性的影响。最后,将所提出的方法与使用9个公共数据集的现有最新算法进行比较。结果表明,Attention-BiLSTM和RF算法在识别二十种动态手语手势方面具有最高的性能。准确率分别为98.85%和97.58%,分别。这为我们提出的数据手套和识别方法的可行性提供了证据。这项研究可以为可穿戴式手语识别设备的开发提供有价值的参考,并促进聋人和健全人之间更容易的交流。
    Finding ways to enable seamless communication between deaf and able-bodied individuals has been a challenging and pressing issue. This paper proposes a solution to this problem by designing a low-cost data glove that utilizes multiple inertial sensors with the purpose of achieving efficient and accurate sign language recognition. In this study, four machine learning models-decision tree (DT), support vector machine (SVM), K-nearest neighbor method (KNN), and random forest (RF)-were employed to recognize 20 different types of dynamic sign language data used by deaf individuals. Additionally, a proposed attention-based mechanism of long and short-term memory neural networks (Attention-BiLSTM) was utilized in the process. Furthermore, this study verifies the impact of the number and position of data glove nodes on the accuracy of recognizing complex dynamic sign language. Finally, the proposed method is compared with existing state-of-the-art algorithms using nine public datasets. The results indicate that both the Attention-BiLSTM and RF algorithms have the highest performance in recognizing the twenty dynamic sign language gestures, with an accuracy of 98.85% and 97.58%, respectively. This provides evidence for the feasibility of our proposed data glove and recognition methods. This study may serve as a valuable reference for the development of wearable sign language recognition devices and promote easier communication between deaf and able-bodied individuals.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    聋人和听力受损的人总是面临沟通障碍。基于表面肌电图(sEMG)传感器的非侵入性手语识别(SLR)技术可以帮助他们更好地融入社会生活。由于大多数基于CNN的研究中使用的传统串联卷积神经网络(CNN)结构不足以捕获输入数据的特征,我们提出了一种新颖的初始体系结构,该体系结构具有残差模块和扩张卷积(IRDC-net),以扩大接受域并丰富特征图,首次将其应用于SLR任务。这项工作首先使用离散傅里叶变换将时域信号转换到时频域。第二,构建了一个IRDC网来识别十个中国手语标志。第三,将串联CNN网络VGG-net和ResNet-18与我们提出的并行结构网络进行了比较,IRDC-net。最后,利用公共数据集NinaproDB1来验证IRDC网的泛化性能。结果表明,将时域sEMG信号转换到时频域后,在我们的手语数据集上使用IRDC网时,分类准确率(acc)从84.29%提高到91.70%.此外,对于公共数据集NinaproDB1的时频信息,分类准确率达到89.82%;这个值高于其他最近的研究。因此,我们的研究结果有助于研究单反任务和改善聋人和听障人士的日常生活。
    Deaf and hearing-impaired people always face communication barriers. Non-invasive surface electromyography (sEMG) sensor-based sign language recognition (SLR) technology can help them to better integrate into social life. Since the traditional tandem convolutional neural network (CNN) structure used in most CNN-based studies inadequately captures the features of the input data, we propose a novel inception architecture with a residual module and dilated convolution (IRDC-net) to enlarge the receptive fields and enrich the feature maps, applying it to SLR tasks for the first time. This work first transformed the time domain signal into a time-frequency domain using discrete Fourier transformation. Second, an IRDC-net was constructed to recognize ten Chinese sign language signs. Third, the tandem CNN networks VGG-net and ResNet-18 were compared with our proposed parallel structure network, IRDC-net. Finally, the public dataset Ninapro DB1 was utilized to verify the generalization performance of the IRDC-net. The results showed that after transforming the time domain sEMG signal into the time-frequency domain, the classification accuracy (acc) increased from 84.29% to 91.70% when using the IRDC-net on our sign language dataset. Furthermore, for the time-frequency information of the public dataset Ninapro DB1, the classification accuracy reached 89.82%; this value is higher than that achieved in other recent studies. As such, our findings contribute to research into SLR tasks and to improving deaf and hearing-impaired people\'s daily lives.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    随着新型冠状病毒的全球传播,避免人与人之间的接触已成为切断病毒传播的有效途径。因此,非接触式手势识别成为疫情防控中降低接触感染风险的有效手段。然而,识别一定人群的日常行为手语对传感技术提出了挑战。无处不在的声学为如何感知日常行为提供了新思路。低采样率的优点,慢传播速度,因此,提出了一种基于超声波信号的非接触式手势和手语行为感知方法——UltruonicGS。该方法使用基于生成对抗网络(GAN)的数据增强技术来扩展数据集,无需人工干预,并提高了行为识别模型的性能。此外,解决连续手势和手语手势的输入和输出序列长度不一致和难以对齐的问题,我们在CRNN网络之后添加了Connectionist时态分类(CTC)算法。此外,该架构可以更好地识别某些人的手语行为,填补了中国手语声学感知的空白。我们在各种真实场景中对UltrasonicGS进行了广泛的实验和评估。实验结果表明,UltrasonicGS对15个单个手势的组合识别率为98.8%,对6组连续手势和手语手势的平均正确识别率为92.4%和86.3%,分别。因此,我们提出的方法为避免人与人之间的接触提供了一种低成本且高度稳健的解决方案.
    With the global spread of the novel coronavirus, avoiding human-to-human contact has become an effective way to cut off the spread of the virus. Therefore, contactless gesture recognition becomes an effective means to reduce the risk of contact infection in outbreak prevention and control. However, the recognition of everyday behavioral sign language of a certain population of deaf people presents a challenge to sensing technology. Ubiquitous acoustics offer new ideas on how to perceive everyday behavior. The advantages of a low sampling rate, slow propagation speed, and easy access to the equipment have led to the widespread use of acoustic signal-based gesture recognition sensing technology. Therefore, this paper proposed a contactless gesture and sign language behavior sensing method based on ultrasonic signals-UltrasonicGS. The method used Generative Adversarial Network (GAN)-based data augmentation techniques to expand the dataset without human intervention and improve the performance of the behavior recognition model. In addition, to solve the problem of inconsistent length and difficult alignment of input and output sequences of continuous gestures and sign language gestures, we added the Connectionist Temporal Classification (CTC) algorithm after the CRNN network. Additionally, the architecture can achieve better recognition of sign language behaviors of certain people, filling the gap of acoustic-based perception of Chinese sign language. We have conducted extensive experiments and evaluations of UltrasonicGS in a variety of real scenarios. The experimental results showed that UltrasonicGS achieved a combined recognition rate of 98.8% for 15 single gestures and an average correct recognition rate of 92.4% and 86.3% for six sets of continuous gestures and sign language gestures, respectively. As a result, our proposed method provided a low-cost and highly robust solution for avoiding human-to-human contact.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    聋哑人就医困难是客观现实。由于手语翻译的缺乏,中国大多数医院目前还不具备手语解释能力。正常的医疗对聋人来说是一种奢侈。在本文中,我们提出了一个手语识别系统:心脏说话者。Heart-Speaker适用于聋哑咨询方案。该系统为治疗聋哑患者的难题提供了一种低成本的解决方案。医生只需要将心脏说话者指向聋哑患者,系统就会自动捕获手语运动并翻译手语语义。当医生做出诊断或向患者提问时,该系统显示相应的手语视频和字幕,以满足医生和患者之间双向沟通的需要。该系统使用MobileNet-YOLOv3模型来识别手语。它满足了在嵌入式终端上运行的需求,并提供了良好的识别精度。我们进行了实验以验证测量的准确性。实验结果表明,Heart-Speaker识别手语的准确率可以达到90.77%。
    It is an objective reality that deaf-mute people have difficulty seeking medical treatment. Due to the lack of sign language interpreters, most hospitals in China currently do not have the ability to interpret sign language. Normal medical treatment is a luxury for deaf people. In this paper, we propose a sign language recognition system: Heart-Speaker. Heart-Speaker is applied to a deaf-mute consultation scenario. The system provides a low-cost solution for the difficult problem of treating deaf-mute patients. The doctor only needs to point the Heart-Speaker at the deaf patient and the system automatically captures the sign language movements and translates the sign language semantics. When a doctor issues a diagnosis or asks a patient a question, the system displays the corresponding sign language video and subtitles to meet the needs of two-way communication between doctors and patients. The system uses the MobileNet-YOLOv3 model to recognize sign language. It meets the needs of running on embedded terminals and provides favorable recognition accuracy. We performed experiments to verify the accuracy of the measurements. The experimental results show that the accuracy rate of Heart-Speaker in recognizing sign language can reach 90.77%.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Sign language is the most important way of communication for hearing-impaired people. Research on sign language recognition can help normal people understand sign language. We reviewed the classic methods of sign language recognition, and the recognition accuracy is not high enough because of redundant information, human finger occlusion, motion blurring, the diversified signing styles of different people, and so on. To overcome these shortcomings, we propose a multi-scale and dual sign language recognition Network (SLR-Net) based on a graph convolutional network (GCN). The original input data was RGB videos. We first extracted the skeleton data from them and then used the skeleton data for sign language recognition. SLR-Net is mainly composed of three sub-modules: multi-scale attention network (MSA), multi-scale spatiotemporal attention network (MSSTA) and attention enhanced temporal convolution network (ATCN). MSA allows the GCN to learn the dependencies between long-distance vertices; MSSTA can directly learn the spatiotemporal features; ATCN allows the GCN network to better learn the long temporal dependencies. The three different attention mechanisms, multi-scale attention mechanism, spatiotemporal attention mechanism, and temporal attention mechanism, are proposed to further improve the robustness and accuracy. Besides, a keyframe extraction algorithm is proposed, which can greatly improve efficiency by sacrificing a little accuracy. Experimental results showed that our method can reach 98.08% accuracy rate in the CSL-500 dataset with a 500-word vocabulary. Even on the challenging dataset DEVISIGN-L with a 2000-word vocabulary, it also reached a 64.57% accuracy rate, outperforming other state-of-the-art sign language recognition methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Sign language was designed to allow hearing-impaired people to interact with others. Nonetheless, knowledge of sign language is uncommon in society, which leads to a communication barrier with the hearing-impaired community. Many studies of sign language recognition utilizing computer vision (CV) have been conducted worldwide to reduce such barriers. However, this approach is restricted by the visual angle and highly affected by environmental factors. In addition, CV usually involves the use of machine learning, which requires collaboration of a team of experts and utilization of high-cost hardware utilities; this increases the application cost in real-world situations. Thus, this study aims to design and implement a smart wearable American Sign Language (ASL) interpretation system using deep learning, which applies sensor fusion that \"fuses\" six inertial measurement units (IMUs). The IMUs are attached to all fingertips and the back of the hand to recognize sign language gestures; thus, the proposed method is not restricted by the field of view. The study reveals that this model achieves an average recognition rate of 99.81% for dynamic ASL gestures. Moreover, the proposed ASL recognition system can be further integrated with ICT and IoT technology to provide a feasible solution to assist hearing-impaired people in communicating with others and improve their quality of life.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    This work developed an ionic sensor for human motion monitoring by employing durable H-reduced graphene oxide (RGO)/carbon nanotubes (CNTs)/Ag electrodes and an ionic polymer interlayer. The sensor functions as a result of unbalanced ion transport and accumulation between two electrodes stimulated by applied deformation. The networking structure and stable electrodes provide convenient ion-transport channels and a large ion accumulation space, resulting in a sensitivity of 2.6 mV in the strain range below 1% and high stability over 6000 bending cycles. Ionic sensors are of intense interest motivated by detecting human activities, which usually associate with a large strain or deformation change. More importantly, direction identification and spatial deformation recognition are feasible in this research, which is beneficial for the detection of complex multidimensional activities. Here, an integrated smart glove with several sensors mounted on the hand joints displays a distinguished ability in the complex geometry of hand configurations. Based on its superior performance, the potential applications of this passive ionic sensor in sign language recognition and human-computer interaction are demonstrated.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号