sign language recognition

  • 文章类型: Journal Article
    解决对可访问的手语学习工具日益增长的需求,本文介绍了一个创新的机器学习驱动的Web应用程序,致力于手语学习。此Web应用程序代表了手语教育的重大进步。与传统方法不同,该应用程序的独特方法包括为用户分配不同的单词来拼写。用户的任务是签署单词的每个字母,在正确签署整个单词时赢得一分。论文深入研究了发展,特点,和应用程序底层的机器学习框架。使用HTML开发,CSS,JavaScript,还有Flask,Web应用程序无缝访问用户的网络摄像头,以获取实时视频源,在屏幕上显示模型的预测,以促进交互式练习。主要目的是为那些不熟悉手语的人提供一个学习平台,为他们提供获得这一基本技能的机会,并在数字时代培养包容性。
    Addressing the increasing demand for accessible sign language learning tools, this paper introduces an innovative Machine Learning-Driven Web Application dedicated to Sign Language Learning. This web application represents a significant advancement in sign language education. Unlike traditional approaches, the application\'s unique methodology involves assigning users different words to spell. Users are tasked with signing each letter of the word, earning a point upon correctly signing the entire word. The paper delves into the development, features, and the machine learning framework underlying the application. Developed using HTML, CSS, JavaScript, and Flask, the web application seamlessly accesses the user\'s webcam for a live video feed, displaying the model\'s predictions on-screen to facilitate interactive practice sessions. The primary aim is to provide a learning platform for those who are not familiar with sign language, offering them the opportunity to acquire this essential skill and fostering inclusivity in the digital age.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    手语识别技术可以帮助有听力障碍的人与非听力障碍的人进行交流。目前,随着社会的快速发展,深度学习也为手语识别工作提供了一定的技术支持。在手语识别任务中,传统的卷积神经网络用于从手语视频中提取时空特征,导致识别率低。然而,大量基于视频的手语数据集需要大量的计算资源进行训练,同时确保网络的泛化,这对认可提出了挑战。在本文中,我们提出了一种基于残差网络(ResNet)和长短期记忆(LSTM)的基于视频的手语识别方法。随着网络层数量的增加,ResNet网络可以有效解决粒度爆炸问题,获得更好的时间序列特征。我们使用ResNet卷积网络作为骨干模型。LSTM利用门的概念来控制单元状态并更新序列的输出特征值。ResNet提取手语特征。然后,将学习到的特征空间作为LSTM网络的输入,获得长序列特征。它可以有效地提取手语视频中的时空特征,提高手语动作的识别率。广泛的实验评估证明了所提出方法的有效性和优越性能。准确率为85.26%,F1-得分为84.98%,阿根廷手语(LSA64)的准确率为87.77%。
    Sign language recognition technology can help people with hearing impairments to communicate with non-hearing-impaired people. At present, with the rapid development of society, deep learning also provides certain technical support for sign language recognition work. In sign language recognition tasks, traditional convolutional neural networks used to extract spatio-temporal features from sign language videos suffer from insufficient feature extraction, resulting in low recognition rates. Nevertheless, a large number of video-based sign language datasets require a significant amount of computing resources for training while ensuring the generalization of the network, which poses a challenge for recognition. In this paper, we present a video-based sign language recognition method based on Residual Network (ResNet) and Long Short-Term Memory (LSTM). As the number of network layers increases, the ResNet network can effectively solve the granularity explosion problem and obtain better time series features. We use the ResNet convolutional network as the backbone model. LSTM utilizes the concept of gates to control unit states and update the output feature values of sequences. ResNet extracts the sign language features. Then, the learned feature space is used as the input of the LSTM network to obtain long sequence features. It can effectively extract the spatio-temporal features in sign language videos and improve the recognition rate of sign language actions. An extensive experimental evaluation demonstrates the effectiveness and superior performance of the proposed method, with an accuracy of 85.26%, F1-score of 84.98%, and precision of 87.77% on Argentine Sign Language (LSA64).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本文提出了一种用于隔离手语识别(SLR)任务的创新方法;这种方法的重点是将姿态数据与从这些数据导出的运动历史图像(MHI)集成。我们的研究结合了从人体获得的空间信息,手,和面部姿势具有三通道MHI数据提供的有关标志的时间动态的全面细节。特别是,我们开发的基于手指姿态的MHI(FP-MHI)功能显着提高了识别成功率,捕捉手指动作和手势的细微差别,与单反中的现有方法不同。此功能通过更准确地捕获手语的细节和丰富度,提高了SLR系统的准确性和可靠性。此外,我们通过线性插值预测缺失的姿态数据来提高整体模型的准确性。我们的研究,基于随机泄漏整流线性单元(RReLU)增强ResNet-18模型,通过使用支持向量机(SVM)融合提取的特征和分类,成功地处理了手动和非手动特征之间的相互作用。与SLR领域的当前方法相比,这种创新的集成展示了各种数据集的竞争性和优越性。包括BosphorusSign22k-general,BosphorusSign22k,LSA64和GSL,在我们的实验中。
    This article presents an innovative approach for the task of isolated sign language recognition (SLR); this approach centers on the integration of pose data with motion history images (MHIs) derived from these data. Our research combines spatial information obtained from body, hand, and face poses with the comprehensive details provided by three-channel MHI data concerning the temporal dynamics of the sign. Particularly, our developed finger pose-based MHI (FP-MHI) feature significantly enhances the recognition success, capturing the nuances of finger movements and gestures, unlike existing approaches in SLR. This feature improves the accuracy and reliability of SLR systems by more accurately capturing the fine details and richness of sign language. Additionally, we enhance the overall model accuracy by predicting missing pose data through linear interpolation. Our study, based on the randomized leaky rectified linear unit (RReLU) enhanced ResNet-18 model, successfully handles the interaction between manual and non-manual features through the fusion of extracted features and classification with a support vector machine (SVM). This innovative integration demonstrates competitive and superior results compared to current methodologies in the field of SLR across various datasets, including BosphorusSign22k-general, BosphorusSign22k, LSA64, and GSL, in our experiments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    聋人和听力困难的人主要使用手语进行交流,这是一组符号,使用手势与面部表情相结合来制作有意义和完整的句子。聋人和听力障碍人士面临的问题是缺乏将手语翻译成书面或口头文本的自动工具,这导致了他们和社区之间的沟通差距。最先进的基于视觉的手语识别方法侧重于翻译非阿拉伯手语,很少有针对阿拉伯手语(ArSL)的,甚至更少的针对沙特手语(SSL)的。本文提出了一种移动应用程序,可以帮助沙特阿拉伯的聋人和听力障碍人士与他们的社区进行有效的沟通。该原型是一个基于Android的移动应用程序,应用深度学习技术将隔离的SSL转换为文本和音频,并包含其他针对ArSL的相关应用程序所没有的独特功能。拟议的方法,当在一个全面的数据集上评估时,通过超越几种最先进的方法并产生与这些方法相当的结果,证明了其有效性。此外,在几个聋哑和听力障碍用户身上测试原型,除了听力用户,证明了它的有用性。在未来,我们的目标是提高模型的准确性,并以更多的功能丰富应用。
    Deaf and hard-of-hearing people mainly communicate using sign language, which is a set of signs made using hand gestures combined with facial expressions to make meaningful and complete sentences. The problem that faces deaf and hard-of-hearing people is the lack of automatic tools that translate sign languages into written or spoken text, which has led to a communication gap between them and their communities. Most state-of-the-art vision-based sign language recognition approaches focus on translating non-Arabic sign languages, with few targeting the Arabic Sign Language (ArSL) and even fewer targeting the Saudi Sign Language (SSL). This paper proposes a mobile application that helps deaf and hard-of-hearing people in Saudi Arabia to communicate efficiently with their communities. The prototype is an Android-based mobile application that applies deep learning techniques to translate isolated SSL to text and audio and includes unique features that are not available in other related applications targeting ArSL. The proposed approach, when evaluated on a comprehensive dataset, has demonstrated its effectiveness by outperforming several state-of-the-art approaches and producing results that are comparable to these approaches. Moreover, testing the prototype on several deaf and hard-of-hearing users, in addition to hearing users, proved its usefulness. In the future, we aim to improve the accuracy of the model and enrich the application with more features.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    手语识别(SLR)对于实现聋哑人和听力社区之间的交流至关重要。然而,由于手势的复杂性和变化,开发全面的手语数据集是一项具有挑战性的任务。这一挑战在孟加拉手语(BdSL)的情况下尤为明显,深度数据集的有限可用性阻碍了准确的识别。为了解决这个问题,我们提出了BdSL47,这是一个开放的深度数据集,用于47个单手静态标志(10位数,从到;和37个字母,从)的BdSL。数据集是使用MediaPipe框架创建的,用于提取深度信息。为了对标志进行分类,我们开发了一个具有63节点输入层的人工神经网络(ANN)模型,一个47节点的输出层,和4个隐藏层,其中包括dropout在最后两个隐藏层,亚当优化器,和ReLU激活功能。根据选定的超参数,提出的ANN模型有效地从基于深度的手势输入特征中学习空间关系和模式,并给出97.84%的F1分数,表明该方法与提供的基线相比的有效性。BdSL47作为综合数据集的可用性可能会影响使用更高级的深度学习模型来提高BdSL的SLR的准确性。
    Sign Language Recognition (SLR) is crucial for enabling communication between the deaf-mute and hearing communities. Nevertheless, the development of a comprehensive sign language dataset is a challenging task due to the complexity and variations in hand gestures. This challenge is particularly evident in the case of Bangla Sign Language (BdSL), where the limited availability of depth datasets impedes accurate recognition. To address this issue, we propose BdSL47, an open-access depth dataset for 47 one-handed static signs (10 digits, from ০ to ৯; and 37 letters, from অ to ँ) of BdSL. The dataset was created using the MediaPipe framework for extracting depth information. To classify the signs, we developed an Artificial Neural Network (ANN) model with a 63-node input layer, a 47-node output layer, and 4 hidden layers that included dropout in the last two hidden layers, an Adam optimizer, and a ReLU activation function. Based on the selected hyperparameters, the proposed ANN model effectively learns the spatial relationships and patterns from the depth-based gestural input features and gives an F1 score of 97.84 %, indicating the effectiveness of the approach compared to the baselines provided. The availability of BdSL47 as a comprehensive dataset can have an impact on improving the accuracy of SLR for BdSL using more advanced deep-learning models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Sign language is a form of communication medium for speech and hearing disabled people. It has various forms with different troublesome patterns, which are difficult for the general mass to comprehend. Bengali sign language (BdSL) is one of the difficult sign languages due to its immense number of alphabet, words, and expression techniques. Machine translation can ease the difficulty for disabled people to communicate with generals. From the machine learning (ML) domain, computer vision can be the solution for them, and every ML solution requires a optimized model and a proper dataset. Therefore, in this research work, we have created a BdSL dataset and named `KU-BdSL\', which consists of 30 classes describing 38 consonants (\'banjonborno\') of the Bengali alphabet. The dataset includes 1500 images of hand signs in total, each representing Bengali consonant(s). Thirty-nine participants (30 males and 9 females) of different ages (21-38 years) participated in the creation of this dataset. We adopted smartphones to capture the images due to the availability of their high-definition cameras. We believe that this dataset can be beneficial to the deaf and dumb (D&D) community. Identification of Bengali consonants of BdSL from images or videos is feasible using the dataset. It can also be employed for a human-machine interface for disabled people. In the future, we will work on the vowels and word level of BdSL.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    有监督的深度学习模型可以通过应用正则化技术来优化,以减少过度拟合。这在微调相关的超参数时可能很困难。不是所有的超参数都相等,了解每个超参数和正则化技术对给定模型性能的影响在研究中至关重要。我们提出第一个全面的,使用改进的单词级美国手语数据集(WLASL-alt)和人类姿势估计关键点数据,对专用编码器转换器进行大规模消融研究,以对手语进行建模。以期限制优化任务的潜力。我们测量了一系列模型参数正则化和数据增强技术对符号分类准确性的影响。我们证明,在引用的不确定性中,除了Wl2参数正则化之外,我们采用的正则化技术都没有对性能产生明显的积极影响,我们发现这与其他类似报告的结果相矛盾,尽管规模较小,研究。我们还证明了模型架构受此任务的小数据集大小的限制,通过找到一组适当的模型参数正则化和通用或基本数据集增强技术。此外,使用基本模型配置,我们在100个标志上报告了一个新的最大顶级1分类准确率为84%,从而改进了该模型体系结构和数据集的先前基准测试结果。
    Supervised deep learning models can be optimised by applying regularisation techniques to reduce overfitting, which can prove difficult when fine tuning the associated hyperparameters. Not all hyperparameters are equal, and understanding the effect each hyperparameter and regularisation technique has on the performance of a given model is of paramount importance in research. We present the first comprehensive, large-scale ablation study for an encoder-only transformer to model sign language using the improved Word-level American Sign Language dataset (WLASL-alt) and human pose estimation keypoint data, with a view to put constraints on the potential to optimise the task. We measure the impact a range of model parameter regularisation and data augmentation techniques have on sign classification accuracy. We demonstrate that within the quoted uncertainties, other than ℓ2 parameter regularisation, none of the regularisation techniques we employ have an appreciable positive impact on performance, which we find to be in contradiction to results reported by other similar, albeit smaller scale, studies. We also demonstrate that the model architecture is bounded by the small dataset size for this task over finding an appropriate set of model parameter regularisation and common or basic dataset augmentation techniques. Furthermore, using the base model configuration, we report a new maximum top-1 classification accuracy of 84% on 100 signs, thereby improving on the previous benchmark result for this model architecture and dataset.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    近年来,晶体管的突触特性已被广泛研究。与基于液体或有机材料的晶体管相比,无机固体电解质栅极晶体管具有化学稳定性好的优点。这项研究使用了一个简单的,AlLiO固体电解质制备In2O3晶体管的低成本溶液技术。该器件的电化学性能是通过形成双电层和电化学掺杂来实现的,可以模仿生物突触的基本功能,如兴奋性突触后电流(EPSC),成对脉冲促进(PPF),和尖峰时间依赖性可塑性(STDP)。此外,成功模拟了复杂的突触行为,例如巴甫洛夫经典条件和摩尔斯电码“青岛”。识别准确率达95%,建立了基于晶体管的人工神经网络来识别手语并实现手语解释。此外,手写数字的识别准确率为94%。即使有各种级别的高斯噪声,识别率仍在84%以上。上述发现证明了In2O3/AlLiOTFT在塑造下一代人工智能方面的潜力。 .
    In recent years, the synaptic properties of transistors have been extensively studied. Compared with liquid or organic material-based transistors, inorganic solid electrolyte-gated transistors have the advantage of better chemical stability. This study uses a simple, low-cost solution technology to prepare In2O3transistors gated by AlLiO solid electrolyte. The electrochemical performance of the device is achieved by forming a double electric layer and electrochemical doping, which can mimic basic functions of biological synapses, such as excitatory postsynaptic current, paired-pulse promotion, and spiking time-dependent plasticity. Furthermore, complex synaptic behaviors such as Pavlovian classical conditioning is successfully emulated. With a 95% identification accuracy, an artificial neural network based on transistors is built to recognize sign language and enable sign language interpretation. Additionally, the handwriting digit\'s identification accuracy is 94%. Even with various levels of Gaussian noise, the recognition rate is still above 84%. The above findings demonstrate the potential of In2O3/AlLiO TFT in shaping the next generation of artificial intelligence.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Systematic Review
    手语的分析和识别是当前活跃的研究领域,主要集中在手语识别上。各种方法在分析方法和用于信号采集的设备方面有所不同。传统方法依赖于使用运动捕捉工具计算的视频分析或空间定位数据。与这些传统的识别和分类方法相比,肌电图(EMG)信号,测量肌肉电活动,提供检测手势的潜在技术。这些基于EMG的方法由于其优点最近受到关注。这促使我们对方法进行全面的研究,方法,以及利用EMG传感器进行手形识别的项目。在本文中,我们通过文献综述概述了手语识别领域,目的是对最重要的技术进行深入的审查。这些技术在本文中根据各自的方法进行了分类。该调查讨论了基于表面肌电图(sEMG)信号的手语识别系统的进展和挑战。这些系统已显示出希望,但面临sEMG数据可变性和传感器放置等问题。多个传感器提高可靠性和准确性。机器学习,包括深度学习,是用来应对这些挑战的。基于sEMG的手语识别中常见的分类器包括SVM,ANN,CNN,KNN,HMM,还有LSTM.当SVM和神经网络被广泛使用时,随机森林和KNN在某些情况下显示出更好的性能。多层感知器神经网络在一项研究中获得了完美的准确性。CNN,经常与LSTM配对,排名第三最受欢迎的分类器,可以实现卓越的准确性,当利用肌电图和IMU数据时,高达99.6%。LSTM在处理EMG信号中的顺序依赖性方面备受推崇,使其成为手语识别系统的重要组成部分。总之,该调查强调了SVM和ANN分类器的普遍性,但也表明了随机森林和KNN等替代分类器的有效性。在基于EMG的手语识别系统中,LSTM成为捕获顺序依赖性和改进手势识别的最合适算法。
    The analysis and recognition of sign languages are currently active fields of research focused on sign recognition. Various approaches differ in terms of analysis methods and the devices used for sign acquisition. Traditional methods rely on video analysis or spatial positioning data calculated using motion capture tools. In contrast to these conventional recognition and classification approaches, electromyogram (EMG) signals, which measure muscle electrical activity, offer potential technology for detecting gestures. These EMG-based approaches have recently gained attention due to their advantages. This prompted us to conduct a comprehensive study on the methods, approaches, and projects utilizing EMG sensors for sign language handshape recognition. In this paper, we provided an overview of the sign language recognition field through a literature review, with the objective of offering an in-depth review of the most significant techniques. These techniques were categorized in this article based on their respective methodologies. The survey discussed the progress and challenges in sign language recognition systems based on surface electromyography (sEMG) signals. These systems have shown promise but face issues like sEMG data variability and sensor placement. Multiple sensors enhance reliability and accuracy. Machine learning, including deep learning, is used to address these challenges. Common classifiers in sEMG-based sign language recognition include SVM, ANN, CNN, KNN, HMM, and LSTM. While SVM and ANN are widely used, random forest and KNN have shown better performance in some cases. A multilayer perceptron neural network achieved perfect accuracy in one study. CNN, often paired with LSTM, ranks as the third most popular classifier and can achieve exceptional accuracy, reaching up to 99.6% when utilizing both EMG and IMU data. LSTM is highly regarded for handling sequential dependencies in EMG signals, making it a critical component of sign language recognition systems. In summary, the survey highlights the prevalence of SVM and ANN classifiers but also suggests the effectiveness of alternative classifiers like random forests and KNNs. LSTM emerges as the most suitable algorithm for capturing sequential dependencies and improving gesture recognition in EMG-based sign language recognition systems.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    寻找实现聋人和健全人之间无缝交流的方法一直是一个具有挑战性和紧迫的问题。本文通过设计一种利用多个惯性传感器的低成本数据手套,提出了解决这一问题的方法,目的是实现高效准确的手语识别。在这项研究中,四种机器学习模型-决策树(DT),支持向量机(SVM),K最近邻方法(KNN),和随机森林(RF)-用于识别聋人使用的20种不同类型的动态手语数据。此外,在此过程中利用了基于注意力的长时和短时记忆神经网络(Attention-BiLSTM)机制。此外,本研究验证了数据手套节点的数量和位置对复杂动态手语识别准确性的影响。最后,将所提出的方法与使用9个公共数据集的现有最新算法进行比较。结果表明,Attention-BiLSTM和RF算法在识别二十种动态手语手势方面具有最高的性能。准确率分别为98.85%和97.58%,分别。这为我们提出的数据手套和识别方法的可行性提供了证据。这项研究可以为可穿戴式手语识别设备的开发提供有价值的参考,并促进聋人和健全人之间更容易的交流。
    Finding ways to enable seamless communication between deaf and able-bodied individuals has been a challenging and pressing issue. This paper proposes a solution to this problem by designing a low-cost data glove that utilizes multiple inertial sensors with the purpose of achieving efficient and accurate sign language recognition. In this study, four machine learning models-decision tree (DT), support vector machine (SVM), K-nearest neighbor method (KNN), and random forest (RF)-were employed to recognize 20 different types of dynamic sign language data used by deaf individuals. Additionally, a proposed attention-based mechanism of long and short-term memory neural networks (Attention-BiLSTM) was utilized in the process. Furthermore, this study verifies the impact of the number and position of data glove nodes on the accuracy of recognizing complex dynamic sign language. Finally, the proposed method is compared with existing state-of-the-art algorithms using nine public datasets. The results indicate that both the Attention-BiLSTM and RF algorithms have the highest performance in recognizing the twenty dynamic sign language gestures, with an accuracy of 98.85% and 97.58%, respectively. This provides evidence for the feasibility of our proposed data glove and recognition methods. This study may serve as a valuable reference for the development of wearable sign language recognition devices and promote easier communication between deaf and able-bodied individuals.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号