Sign language recognition

  • 文章类型: Journal Article
    连续手语识别(CSLR)是将手语视频转换为光泽序列的任务。现有的基于深度学习的手语识别方法通常依赖于大规模的训练数据和丰富的监督信息。然而,当前手语数据集有限,它们仅在句子级别而不是框架级别进行注释。对手语数据的监管不足对手语识别提出了严峻的挑战,这可能导致手语识别模型训练不足。为了解决上述问题,我们提出了一种用于连续手语识别的跨模态知识蒸馏方法,其中包含两个教师模型和一个学生模型。教师模型之一是Sign2Text对话教师模型,输入手语视频和对话句,输出手语识别结果。另一种教师模式是Text2Gloss翻译教师模式,其目标是将文本句子翻译成光泽序列。两种教师模式都可以提供信息丰富的软标签来辅助学生模式的训练,这是一个通用的手语识别模型。我们对多个常用的手语数据集进行了广泛的实验,即,凤凰2014T,CSL-Daily和QSL,结果表明,所提出的跨模态知识蒸馏方法通过将多模态信息从教师模型传递到学生模型,能够有效提高手语识别的准确率。代码可在https://github.com/glq-1992/cross-modal-knowledge-restrination_new上找到。
    Continuous Sign Language Recognition (CSLR) is a task which converts a sign language video into a gloss sequence. The existing deep learning based sign language recognition methods usually rely on large-scale training data and rich supervised information. However, current sign language datasets are limited, and they are only annotated at sentence-level rather than frame-level. Inadequate supervision of sign language data poses a serious challenge for sign language recognition, which may result in insufficient training of sign language recognition models. To address above problems, we propose a cross-modal knowledge distillation method for continuous sign language recognition, which contains two teacher models and one student model. One of the teacher models is the Sign2Text dialogue teacher model, which takes a sign language video and a dialogue sentence as input and outputs the sign language recognition result. The other teacher model is the Text2Gloss translation teacher model, which targets to translate a text sentence into a gloss sequence. Both teacher models can provide information-rich soft labels to assist the training of the student model, which is a general sign language recognition model. We conduct extensive experiments on multiple commonly used sign language datasets, i.e., PHOENIX 2014T, CSL-Daily and QSL, the results show that the proposed cross-modal knowledge distillation method can effectively improve the sign language recognition accuracy by transferring multi-modal information from teacher models to the student model. Code is available at https://github.com/glq-1992/cross-modal-knowledge-distillation_new.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    手势是一种自然而直观的交流形式,并且将这种通信方法集成到机器人系统中具有改善人机协作的巨大潜力。运动神经科学的最新进展集中在从也称为运动原语的协同作用中复制人手运动。协同作用,运动的基本组成部分,作为中枢神经系统适应的潜在策略来产生和控制运动。确定协同作用如何促进运动可以帮助机器人的灵巧控制,外骨骼,并将其应用于康复。在本文中,通过单个RGB相机记录了33个静态手势,并通过MediaPipe框架实时识别出参与者用惯用手做出各种姿势。假设手掌张开作为初始姿势,从所有这些手势获得均匀的关节角速度。通过应用降维方法,从这些关节角速度中获得了运动学协同作用。可以解释98%的运动变化的运动学协同作用被用来使用凸优化来重建新的手势。重建的手势和选定的运动学协同作用被转换到人形机器人上,Mitra,实时,参与者展示了各种手势。结果表明,通过仅使用很少的运动学协同,可以生成各种手势,准确率为95.7%。此外,利用低维协同控制高维末端执行器有望实现近乎自然的人机协作。
    Hand gestures are a natural and intuitive form of communication, and integrating this communication method into robotic systems presents significant potential to improve human-robot collaboration. Recent advances in motor neuroscience have focused on replicating human hand movements from synergies also known as movement primitives. Synergies, fundamental building blocks of movement, serve as a potential strategy adapted by the central nervous system to generate and control movements. Identifying how synergies contribute to movement can help in dexterous control of robotics, exoskeletons, prosthetics and extend its applications to rehabilitation. In this paper, 33 static hand gestures were recorded through a single RGB camera and identified in real-time through the MediaPipe framework as participants made various postures with their dominant hand. Assuming an open palm as initial posture, uniform joint angular velocities were obtained from all these gestures. By applying a dimensionality reduction method, kinematic synergies were obtained from these joint angular velocities. Kinematic synergies that explain 98% of variance of movements were utilized to reconstruct new hand gestures using convex optimization. Reconstructed hand gestures and selected kinematic synergies were translated onto a humanoid robot, Mitra, in real-time, as the participants demonstrated various hand gestures. The results showed that by using only few kinematic synergies it is possible to generate various hand gestures, with 95.7% accuracy. Furthermore, utilizing low-dimensional synergies in control of high dimensional end effectors holds promise to enable near-natural human-robot collaboration.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究的目的是开发一种实用的软件解决方案,用于使用两个手臂实时识别手语单词。这将促进听力受损者与能听见者之间的交流。我们知道使用不同技术开发的几种手语识别系统,包括摄像头,臂章,和手套。然而,我们在这项研究中提出的系统以其实用性而脱颖而出,利用两臂的表面肌电图(肌肉活动)和惯性测量单元(运动动力学)数据。我们解决了其他方法的缺点,比如高成本,由于环境光和障碍物的低精度,和复杂的硬件要求,这限制了它们的实际应用。我们的软件可以使用本研究特有的数字信号处理和机器学习方法在不同的操作系统上运行。对于测试,我们根据其在日常生活中的使用频率创建了一个包含80个单词的数据集,并进行了彻底的特征提取过程。我们使用各种分类器和参数测试了识别性能,并比较了结果。随机森林算法是最成功的,达到惊人的99.875%的准确度,而朴素贝叶斯算法的成功率最低,准确率为87.625%。新系统有望显着改善听力障碍者的沟通,并确保无缝集成到日常生活中,而不会影响用户的舒适度或生活质量。
    The aim of this study is to develop a practical software solution for real-time recognition of sign language words using two arms. This will facilitate communication between hearing-impaired individuals and those who can hear. We are aware of several sign language recognition systems developed using different technologies, including cameras, armbands, and gloves. However, the system we propose in this study stands out for its practicality, utilizing surface electromyography (muscle activity) and inertial measurement unit (motion dynamics) data from both arms. We address the drawbacks of other methods, such as high costs, low accuracy due to ambient light and obstacles, and complex hardware requirements, which have limited their practical application. Our software can run on different operating systems using digital signal processing and machine learning methods specific to this study. For the test, we created a dataset of 80 words based on their frequency of use in daily life and performed a thorough feature extraction process. We tested the recognition performance using various classifiers and parameters and compared the results. The random forest algorithm emerged as the most successful, achieving a remarkable 99.875% accuracy, while the naïve Bayes algorithm had the lowest success rate with 87.625% accuracy. The new system promises to significantly improve communication for people with hearing disabilities and ensures seamless integration into daily life without compromising user comfort or lifestyle quality.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    柔性应变传感器在智能可穿戴设备等领域得到了广泛的研究,人类健康监测,和生物医学应用。然而,同时实现柔性应变传感器的宽传感范围和高灵敏度仍然是一个挑战,限制其进一步应用。为了解决这些问题,提出了一种跨尺度组合仿生分层设计,其特征是微尺度形态与宏观尺度基础相结合,以平衡传感范围和灵敏度。受蛇形和蝴蝶翅膀结构组合的启发,这项研究采用了三维打印,预拉伸,和模具转移过程,以构造具有蛇形倒V形槽/起皱开裂结构的组合仿生分层柔性应变传感器(CBH传感器)。CBH传感器具有150%的高宽传感范围和高灵敏度,仪表系数高达2416.67。此外,它展示了CBH传感器阵列在手语手势识别中的应用,在机器学习的帮助下,成功识别九种不同的手语手势,准确率达到了令人印象深刻的100%。CBH传感器在使用手语的个人和不使用手语的个人之间实现无障碍通信方面表现出相当大的前景。此外,它在人机界面中的手势驱动交互领域中具有广泛的使用可能性。
    Flexible strain sensors have been widely researched in fields such as smart wearables, human health monitoring, and biomedical applications. However, achieving a wide sensing range and high sensitivity of flexible strain sensors simultaneously remains a challenge, limiting their further applications. To address these issues, a cross-scale combinatorial bionic hierarchical design featuring microscale morphology combined with a macroscale base to balance the sensing range and sensitivity is presented. Inspired by the combination of serpentine and butterfly wing structures, this study employs three-dimensional printing, prestretching, and mold transfer processes to construct a combinatorial bionic hierarchical flexible strain sensor (CBH-sensor) with serpentine-shaped inverted-V-groove/wrinkling-cracking structures. The CBH-sensor has a high wide sensing range of 150% and high sensitivity with a gauge factor of up to 2416.67. In addition, it demonstrates the application of the CBH-sensor array in sign language gesture recognition, successfully identifying nine different sign language gestures with an impressive accuracy of 100% with the assistance of machine learning. The CBH-sensor exhibits considerable promise for use in enabling unobstructed communication between individuals who use sign language and those who do not. Furthermore, it has wide-ranging possibilities for use in the field of gesture-driven interactions in human-computer interfaces.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    解决对可访问的手语学习工具日益增长的需求,本文介绍了一个创新的机器学习驱动的Web应用程序,致力于手语学习。此Web应用程序代表了手语教育的重大进步。与传统方法不同,该应用程序的独特方法包括为用户分配不同的单词来拼写。用户的任务是签署单词的每个字母,在正确签署整个单词时赢得一分。论文深入研究了发展,特点,和应用程序底层的机器学习框架。使用HTML开发,CSS,JavaScript,还有Flask,Web应用程序无缝访问用户的网络摄像头,以获取实时视频源,在屏幕上显示模型的预测,以促进交互式练习。主要目的是为那些不熟悉手语的人提供一个学习平台,为他们提供获得这一基本技能的机会,并在数字时代培养包容性。
    Addressing the increasing demand for accessible sign language learning tools, this paper introduces an innovative Machine Learning-Driven Web Application dedicated to Sign Language Learning. This web application represents a significant advancement in sign language education. Unlike traditional approaches, the application\'s unique methodology involves assigning users different words to spell. Users are tasked with signing each letter of the word, earning a point upon correctly signing the entire word. The paper delves into the development, features, and the machine learning framework underlying the application. Developed using HTML, CSS, JavaScript, and Flask, the web application seamlessly accesses the user\'s webcam for a live video feed, displaying the model\'s predictions on-screen to facilitate interactive practice sessions. The primary aim is to provide a learning platform for those who are not familiar with sign language, offering them the opportunity to acquire this essential skill and fostering inclusivity in the digital age.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    手语识别技术可以帮助有听力障碍的人与非听力障碍的人进行交流。目前,随着社会的快速发展,深度学习也为手语识别工作提供了一定的技术支持。在手语识别任务中,传统的卷积神经网络用于从手语视频中提取时空特征,导致识别率低。然而,大量基于视频的手语数据集需要大量的计算资源进行训练,同时确保网络的泛化,这对认可提出了挑战。在本文中,我们提出了一种基于残差网络(ResNet)和长短期记忆(LSTM)的基于视频的手语识别方法。随着网络层数量的增加,ResNet网络可以有效解决粒度爆炸问题,获得更好的时间序列特征。我们使用ResNet卷积网络作为骨干模型。LSTM利用门的概念来控制单元状态并更新序列的输出特征值。ResNet提取手语特征。然后,将学习到的特征空间作为LSTM网络的输入,获得长序列特征。它可以有效地提取手语视频中的时空特征,提高手语动作的识别率。广泛的实验评估证明了所提出方法的有效性和优越性能。准确率为85.26%,F1-得分为84.98%,阿根廷手语(LSA64)的准确率为87.77%。
    Sign language recognition technology can help people with hearing impairments to communicate with non-hearing-impaired people. At present, with the rapid development of society, deep learning also provides certain technical support for sign language recognition work. In sign language recognition tasks, traditional convolutional neural networks used to extract spatio-temporal features from sign language videos suffer from insufficient feature extraction, resulting in low recognition rates. Nevertheless, a large number of video-based sign language datasets require a significant amount of computing resources for training while ensuring the generalization of the network, which poses a challenge for recognition. In this paper, we present a video-based sign language recognition method based on Residual Network (ResNet) and Long Short-Term Memory (LSTM). As the number of network layers increases, the ResNet network can effectively solve the granularity explosion problem and obtain better time series features. We use the ResNet convolutional network as the backbone model. LSTM utilizes the concept of gates to control unit states and update the output feature values of sequences. ResNet extracts the sign language features. Then, the learned feature space is used as the input of the LSTM network to obtain long sequence features. It can effectively extract the spatio-temporal features in sign language videos and improve the recognition rate of sign language actions. An extensive experimental evaluation demonstrates the effectiveness and superior performance of the proposed method, with an accuracy of 85.26%, F1-score of 84.98%, and precision of 87.77% on Argentine Sign Language (LSA64).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本文提出了一种用于隔离手语识别(SLR)任务的创新方法;这种方法的重点是将姿态数据与从这些数据导出的运动历史图像(MHI)集成。我们的研究结合了从人体获得的空间信息,手,和面部姿势具有三通道MHI数据提供的有关标志的时间动态的全面细节。特别是,我们开发的基于手指姿态的MHI(FP-MHI)功能显着提高了识别成功率,捕捉手指动作和手势的细微差别,与单反中的现有方法不同。此功能通过更准确地捕获手语的细节和丰富度,提高了SLR系统的准确性和可靠性。此外,我们通过线性插值预测缺失的姿态数据来提高整体模型的准确性。我们的研究,基于随机泄漏整流线性单元(RReLU)增强ResNet-18模型,通过使用支持向量机(SVM)融合提取的特征和分类,成功地处理了手动和非手动特征之间的相互作用。与SLR领域的当前方法相比,这种创新的集成展示了各种数据集的竞争性和优越性。包括BosphorusSign22k-general,BosphorusSign22k,LSA64和GSL,在我们的实验中。
    This article presents an innovative approach for the task of isolated sign language recognition (SLR); this approach centers on the integration of pose data with motion history images (MHIs) derived from these data. Our research combines spatial information obtained from body, hand, and face poses with the comprehensive details provided by three-channel MHI data concerning the temporal dynamics of the sign. Particularly, our developed finger pose-based MHI (FP-MHI) feature significantly enhances the recognition success, capturing the nuances of finger movements and gestures, unlike existing approaches in SLR. This feature improves the accuracy and reliability of SLR systems by more accurately capturing the fine details and richness of sign language. Additionally, we enhance the overall model accuracy by predicting missing pose data through linear interpolation. Our study, based on the randomized leaky rectified linear unit (RReLU) enhanced ResNet-18 model, successfully handles the interaction between manual and non-manual features through the fusion of extracted features and classification with a support vector machine (SVM). This innovative integration demonstrates competitive and superior results compared to current methodologies in the field of SLR across various datasets, including BosphorusSign22k-general, BosphorusSign22k, LSA64, and GSL, in our experiments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    聋人和听力困难的人主要使用手语进行交流,这是一组符号,使用手势与面部表情相结合来制作有意义和完整的句子。聋人和听力障碍人士面临的问题是缺乏将手语翻译成书面或口头文本的自动工具,这导致了他们和社区之间的沟通差距。最先进的基于视觉的手语识别方法侧重于翻译非阿拉伯手语,很少有针对阿拉伯手语(ArSL)的,甚至更少的针对沙特手语(SSL)的。本文提出了一种移动应用程序,可以帮助沙特阿拉伯的聋人和听力障碍人士与他们的社区进行有效的沟通。该原型是一个基于Android的移动应用程序,应用深度学习技术将隔离的SSL转换为文本和音频,并包含其他针对ArSL的相关应用程序所没有的独特功能。拟议的方法,当在一个全面的数据集上评估时,通过超越几种最先进的方法并产生与这些方法相当的结果,证明了其有效性。此外,在几个聋哑和听力障碍用户身上测试原型,除了听力用户,证明了它的有用性。在未来,我们的目标是提高模型的准确性,并以更多的功能丰富应用。
    Deaf and hard-of-hearing people mainly communicate using sign language, which is a set of signs made using hand gestures combined with facial expressions to make meaningful and complete sentences. The problem that faces deaf and hard-of-hearing people is the lack of automatic tools that translate sign languages into written or spoken text, which has led to a communication gap between them and their communities. Most state-of-the-art vision-based sign language recognition approaches focus on translating non-Arabic sign languages, with few targeting the Arabic Sign Language (ArSL) and even fewer targeting the Saudi Sign Language (SSL). This paper proposes a mobile application that helps deaf and hard-of-hearing people in Saudi Arabia to communicate efficiently with their communities. The prototype is an Android-based mobile application that applies deep learning techniques to translate isolated SSL to text and audio and includes unique features that are not available in other related applications targeting ArSL. The proposed approach, when evaluated on a comprehensive dataset, has demonstrated its effectiveness by outperforming several state-of-the-art approaches and producing results that are comparable to these approaches. Moreover, testing the prototype on several deaf and hard-of-hearing users, in addition to hearing users, proved its usefulness. In the future, we aim to improve the accuracy of the model and enrich the application with more features.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    手语识别(SLR)对于实现聋哑人和听力社区之间的交流至关重要。然而,由于手势的复杂性和变化,开发全面的手语数据集是一项具有挑战性的任务。这一挑战在孟加拉手语(BdSL)的情况下尤为明显,深度数据集的有限可用性阻碍了准确的识别。为了解决这个问题,我们提出了BdSL47,这是一个开放的深度数据集,用于47个单手静态标志(10位数,从到;和37个字母,从)的BdSL。数据集是使用MediaPipe框架创建的,用于提取深度信息。为了对标志进行分类,我们开发了一个具有63节点输入层的人工神经网络(ANN)模型,一个47节点的输出层,和4个隐藏层,其中包括dropout在最后两个隐藏层,亚当优化器,和ReLU激活功能。根据选定的超参数,提出的ANN模型有效地从基于深度的手势输入特征中学习空间关系和模式,并给出97.84%的F1分数,表明该方法与提供的基线相比的有效性。BdSL47作为综合数据集的可用性可能会影响使用更高级的深度学习模型来提高BdSL的SLR的准确性。
    Sign Language Recognition (SLR) is crucial for enabling communication between the deaf-mute and hearing communities. Nevertheless, the development of a comprehensive sign language dataset is a challenging task due to the complexity and variations in hand gestures. This challenge is particularly evident in the case of Bangla Sign Language (BdSL), where the limited availability of depth datasets impedes accurate recognition. To address this issue, we propose BdSL47, an open-access depth dataset for 47 one-handed static signs (10 digits, from ০ to ৯; and 37 letters, from অ to ँ) of BdSL. The dataset was created using the MediaPipe framework for extracting depth information. To classify the signs, we developed an Artificial Neural Network (ANN) model with a 63-node input layer, a 47-node output layer, and 4 hidden layers that included dropout in the last two hidden layers, an Adam optimizer, and a ReLU activation function. Based on the selected hyperparameters, the proposed ANN model effectively learns the spatial relationships and patterns from the depth-based gestural input features and gives an F1 score of 97.84 %, indicating the effectiveness of the approach compared to the baselines provided. The availability of BdSL47 as a comprehensive dataset can have an impact on improving the accuracy of SLR for BdSL using more advanced deep-learning models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Sign language is a form of communication medium for speech and hearing disabled people. It has various forms with different troublesome patterns, which are difficult for the general mass to comprehend. Bengali sign language (BdSL) is one of the difficult sign languages due to its immense number of alphabet, words, and expression techniques. Machine translation can ease the difficulty for disabled people to communicate with generals. From the machine learning (ML) domain, computer vision can be the solution for them, and every ML solution requires a optimized model and a proper dataset. Therefore, in this research work, we have created a BdSL dataset and named `KU-BdSL\', which consists of 30 classes describing 38 consonants (\'banjonborno\') of the Bengali alphabet. The dataset includes 1500 images of hand signs in total, each representing Bengali consonant(s). Thirty-nine participants (30 males and 9 females) of different ages (21-38 years) participated in the creation of this dataset. We adopted smartphones to capture the images due to the availability of their high-definition cameras. We believe that this dataset can be beneficial to the deaf and dumb (D&D) community. Identification of Bengali consonants of BdSL from images or videos is feasible using the dataset. It can also be employed for a human-machine interface for disabled people. In the future, we will work on the vowels and word level of BdSL.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号