head pose estimation

头部姿势估计
  • 文章类型: Journal Article
    通过基准标记对头部姿势的光学跟踪已经被证明能够在磁共振成像期间有效地校正大脑中的运动伪影,但是由于冗长的校准和设置时间而仍然难以在临床中实现。由于运动校正所需的亚毫米空间分辨率,用于无标记头部姿势估计的深度学习的进展尚未应用于该问题。在目前的工作中,描述了两个用于开发和训练神经网络的光学跟踪系统:一个基于标记的系统(用于测量地面真相头部姿势的测试平台)具有高跟踪保真度作为训练标签,和一个无标记的基于深度学习的系统,使用无标记的头部的图像作为网络的输入。无标记系统有可能克服标记物遮挡的问题,标记的刚性连接不足,冗长的校准时间,以及跨自由度(DOF)的不平等性能,所有这些都阻碍了在临床中采用基于标记的解决方案。提供了有关用作地面实况的自定义莫尔增强基准标记的开发以及两个光学跟踪系统的校准程序的详细信息。此外,描述了合成头部姿态数据集的开发,以证明简单卷积神经网络的概念和初始预训练。结果表明,地面实况系统已得到充分校准,可以跟踪磁头姿态,误差<1mm和<1°。跟踪健康的数据,成人参与者显示。预训练结果表明,在训练数据集包含和排除的头部模型上,6个自由度的平均均方根误差为0.13和0.36(mm或度)。分别。总的来说,这项工作表明了基于深度学习的方法的出色可行性,并将使未来的工作能够在MRI环境中对真实数据集进行培训和测试。
    Optical tracking of head pose via fiducial markers has been proven to enable effective correction of motion artifacts in the brain during magnetic resonance imaging but remains difficult to implement in the clinic due to lengthy calibration and set up times. Advances in deep learning for markerless head pose estimation have yet to be applied to this problem because of the sub-millimetre spatial resolution required for motion correction. In the present work, two optical tracking systems are described for the development and training of a neural network: one marker-based system (a testing platform for measuring ground truth head pose) with high tracking fidelity to act as the training labels, and one markerless deep-learning-based system using images of the markerless head as input to the network. The markerless system has the potential to overcome issues of marker occlusion, insufficient rigid attachment of the marker, lengthy calibration times, and unequal performance across degrees of freedom (DOF), all of which hamper the adoption of marker-based solutions in the clinic. Detail is provided on the development of a custom moiré-enhanced fiducial marker for use as ground truth and on the calibration procedure for both optical tracking systems. Additionally, the development of a synthetic head pose dataset is described for the proof of concept and initial pre-training of a simple convolutional neural network. Results indicate that the ground truth system has been sufficiently calibrated and can track head pose with an error of <1 mm and <1°. Tracking data of a healthy, adult participant are shown. Pre-training results show that the average root-mean-squared error across the 6 DOF is 0.13 and 0.36 (mm or degrees) on a head model included and excluded from the training dataset, respectively. Overall, this work indicates excellent feasibility of the deep-learning-based approach and will enable future work in training and testing on a real dataset in the MRI environment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    大多数面部分析方法在标准化测试中表现良好,但在实际测试中表现不佳。主要原因是训练模型无法轻松学习各种人类特征和背景噪音,特别是对于面部标志检测和头部姿态估计任务与有限和嘈杂的训练数据集。为了缓解标准化测试和真实世界测试之间的差距,我们提出了一种伪标记技术,使用由各种人和背景噪声组成的面部识别数据集。使用我们的伪标记训练数据集可以帮助克服数据集中人之间缺乏多样性的问题。我们的集成框架是使用互补的多任务学习方法构建的,可以为每个任务提取健壮的特征。此外,引入伪标记和多任务学习通过实现姿态不变特征的学习来提高人脸识别性能。我们的方法在AFLW2000-3D和BIWI数据集上实现了最先进的(SOTA)或接近SOTA的性能,用于面部标志检测和头部姿势估计,在用于人脸识别的IJB-C测试数据集上具有竞争力的人脸验证性能。我们通过一种新颖的测试方法来证明这一点,该方法将案例分类为软,中等,并且很难基于IJB-C的位姿值。即使在数据集缺乏不同的人脸识别时,该方法也能实现稳定的性能。
    Most facial analysis methods perform well in standardized testing but not in real-world testing. The main reason is that training models cannot easily learn various human features and background noise, especially for facial landmark detection and head pose estimation tasks with limited and noisy training datasets. To alleviate the gap between standardized and real-world testing, we propose a pseudo-labeling technique using a face recognition dataset consisting of various people and background noise. The use of our pseudo-labeled training dataset can help to overcome the lack of diversity among the people in the dataset. Our integrated framework is constructed using complementary multitask learning methods to extract robust features for each task. Furthermore, introducing pseudo-labeling and multitask learning improves the face recognition performance by enabling the learning of pose-invariant features. Our method achieves state-of-the-art (SOTA) or near-SOTA performance on the AFLW2000-3D and BIWI datasets for facial landmark detection and head pose estimation, with competitive face verification performance on the IJB-C test dataset for face recognition. We demonstrate this through a novel testing methodology that categorizes cases as soft, medium, and hard based on the pose values of IJB-C. The proposed method achieves stable performance even when the dataset lacks diverse face identifications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在本文中,提出了一种基于人脸检测和头部姿态估计的学习状态评价方法。该方法适用于计算能力较弱的移动设备,因此有必要对人脸检测和头部姿态估计网络的参数进行控制。首先,我们提出了一个鬼影和注意力模块(GA)基本人脸检测网络(GA-Face)。GA-Face通过鬼影模块减少了特征提取网络中的参数数量和计算量,并通过无参数的注意力机制将网络集中在重要特征上。我们还提出了一个轻量级的双分支(DB)头部姿态估计网络:DB-Net。最后,提出了一种学生学习状态评价算法。该算法可以根据学生的面部与屏幕之间的距离来评估学生的学习状态,以及他们的头部姿势。我们在几个标准人脸检测数据集和标准头部姿态估计数据集上验证了所提出的GA-Face和DB-Net的有效性。最后,我们验证,通过实际案例,认为所提出的在线学习状态评估方法可以有效地评估学生的注意力和专注度,and,由于其计算复杂度低,不会干扰学生的学习过程。
    In this paper, we propose a learning state evaluation method based on face detection and head pose estimation. This method is suitable for mobile devices with weak computing power, so it is necessary to control the parameter quantity of the face detection and head pose estimation network. Firstly, we propose a ghost and attention module (GA) base face detection network (GA-Face). GA-Face reduces the number of parameters and computation in the feature extraction network through the ghost module, and focuses the network on important features through a parameter-free attention mechanism. We also propose a lightweight dual-branch (DB) head pose estimation network: DB-Net. Finally, we propose a student learning state evaluation algorithm. This algorithm can evaluate the learning status of students based on the distance between their faces and the screen, as well as their head posture. We validate the effectiveness of the proposed GA-Face and DB-Net on several standard face detection datasets and standard head pose estimation datasets. Finally, we validate, through practical cases, that the proposed online learning state assessment method can effectively assess the level of student attention and concentration, and, due to its low computational complexity, will not interfere with the student\'s learning process.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    头部姿势估计服务于各种应用,比如凝视估计,疲劳驱动检测,和虚拟现实。尽管如此,由于对单一数据源的依赖,实现精确和有效的预测仍然具有挑战性。因此,这项研究介绍了一种涉及多模态特征融合的技术,以提高头部姿态估计的准确性。所提出的方法合并了来自不同来源的数据,包括RGB和深度图像,为了构建头部的全面三维表示,通常称为点云。该方法值得注意的创新包括PointNet中的剩余多层感知器结构,旨在应对与梯度相关的挑战,以及旨在降低噪声的空间自我注意机制。增强的PointNet和ResNet网络用于从点云和图像中提取特征。这些提取的特征经历融合。此外,评分模块的合并增强了鲁棒性,特别是在涉及面部遮挡的场景中。这是通过保留得分最高的点云中的特征来实现的。此外,采用预测模块,结合分类和回归方法来准确估计头部姿势。该方法提高了头部姿态估计的准确性和鲁棒性,尤其是涉及面部阻塞的病例。这些进步通过使用BIWI数据集进行的实验得到了证实,证明了该方法相对于现有技术的优越性。
    Head pose estimation serves various applications, such as gaze estimation, fatigue-driven detection, and virtual reality. Nonetheless, achieving precise and efficient predictions remains challenging owing to the reliance on singular data sources. Therefore, this study introduces a technique involving multimodal feature fusion to elevate head pose estimation accuracy. The proposed method amalgamates data derived from diverse sources, including RGB and depth images, to construct a comprehensive three-dimensional representation of the head, commonly referred to as a point cloud. The noteworthy innovations of this method encompass a residual multilayer perceptron structure within PointNet, designed to tackle gradient-related challenges, along with spatial self-attention mechanisms aimed at noise reduction. The enhanced PointNet and ResNet networks are utilized to extract features from both point clouds and images. These extracted features undergo fusion. Furthermore, the incorporation of a scoring module strengthens robustness, particularly in scenarios involving facial occlusion. This is achieved by preserving features from the highest-scoring point cloud. Additionally, a prediction module is employed, combining classification and regression methodologies to accurately estimate head poses. The proposed method improves the accuracy and robustness of head pose estimation, especially in cases involving facial obstructions. These advancements are substantiated by experiments conducted using the BIWI dataset, demonstrating the superiority of this method over existing techniques.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    儿童行为障碍的患病率越来越高,这在医学界引起了越来越多的关注。认识到早期识别和干预非典型行为的重要性,他们在改善成果方面的关键作用已经达成共识。由于设施不足和缺乏具有专业知识的医疗专业人员,传统的诊断方法已无法有效解决行为障碍发病率上升的问题。因此,有必要开发自动诊断儿童行为障碍的方法,克服传统方法的挑战。这项研究的目的是开发一种自动模型,该模型能够分析视频以区分典型和非典型的重复头部运动。为了解决由于子数据集的可用性有限而导致的问题,采用各种学习方法来缓解这些问题。在这项工作中,我们提出了变压器网络的融合,和非确定性有限自动机(NFA)技术,根据对性别的分析,将儿童的重复头部运动分类为典型或非典型,年龄,和重复头部运动的类型,还有伯爵,持续时间,和每个重复的头部运动的频率。使用不同的迁移学习方法进行了实验,以增强模型的性能。在五个数据集上的实验结果:NIR人脸数据集,博斯普鲁斯海峡3D人脸数据集,ASD数据集,SSBD数据集,和野生数据集中的头部运动,表明我们提出的模型在区分儿童的典型和非典型重复性头部运动时优于许多最先进的框架。
    The increasing prevalence of behavioral disorders in children is of growing concern within the medical community. Recognising the significance of early identification and intervention for atypical behaviors, there is a consensus on their pivotal role in improving outcomes. Due to inadequate facilities and a shortage of medical professionals with specialized expertise, traditional diagnostic methods have been unable to effectively address the rising incidence of behavioral disorders. Hence, there is a need to develop automated approaches for the diagnosis of behavioral disorders in children, to overcome the challenges with traditional methods. The purpose of this study is to develop an automated model capable of analyzing videos to differentiate between typical and atypical repetitive head movements in. To address problems resulting from the limited availability of child datasets, various learning methods are employed to mitigate these issues. In this work, we present a fusion of transformer networks, and Non-deterministic Finite Automata (NFA) techniques, which classify repetitive head movements of a child as typical or atypical based on an analysis of gender, age, and type of repetitive head movement, along with count, duration, and frequency of each repetitive head movement. Experimentation was carried out with different transfer learning methods to enhance the performance of the model. The experimental results on five datasets: NIR face dataset, Bosphorus 3D face dataset, ASD dataset, SSBD dataset, and the Head Movements in the Wild dataset, indicate that our proposed model has outperformed many state-of-the-art frameworks when distinguishing typical and atypical repetitive head movements in children.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    头部姿态估计是分析人体行为的重要技术,在人机交互、疲劳检测等领域得到了广泛的研究和应用。然而,传统的头部姿态估计网络容易丢失空间结构信息,特别是在遮挡和多对象检测常见的复杂场景中,导致精度低。为了解决上述问题,提出了一种基于残差网络和胶囊网络的头部姿态估计模型。首先,深度残差网络用于从三个阶段提取特征,捕获不同层次的空间结构信息,并采用全局注意块来增强特征提取的空间权重。为有效避免空间结构信息的丢失,使用改进的胶囊网络对特征进行编码并传输到输出,通过自注意路由机制增强了其泛化能力。为了增强模型的鲁棒性,我们优化Huber损失,它首先用于头部姿势估计。最后,实验是在三个流行的公共数据集上进行的,300W-LP,AFLW2000和BIWI。结果表明,所提出的方法达到了最先进的结果,特别是在有遮挡的情况下。
    Head pose estimation is an important technology for analyzing human behavior and has been widely researched and applied in areas such as human-computer interaction and fatigue detection. However, traditional head pose estimation networks suffer from the problem of easily losing spatial structure information, particularly in complex scenarios where occlusions and multiple object detections are common, resulting in low accuracy. To address the above issues, we propose a head pose estimation model based on the residual network and capsule network. Firstly, a deep residual network is used to extract features from three stages, capturing spatial structure information at different levels, and a global attention block is employed to enhance the spatial weight of feature extraction. To effectively avoid the loss of spatial structure information, the features are encoded and transmitted to the output using an improved capsule network, which is enhanced in its generalization ability through self-attention routing mechanisms. To enhance the robustness of the model, we optimize Huber loss, which is first used in head pose estimation. Finally, experiments are conducted on three popular public datasets, 300W-LP, AFLW2000, and BIWI. The results demonstrate that the proposed method achieves state-of-the-art results, particularly in scenarios with occlusions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    与困倦有关的车祸继续对道路安全产生重大影响。一旦驾驶员开始感到困倦,就可以通过警告驾驶员来消除其中的许多事故。这项工作提出了一种使用视觉特征进行实时驾驶员困倦检测的非侵入性系统。这些特征是从从安装在仪表板上的摄像机获得的视频中提取的。所提出的系统使用面部标志和面部网格检测器来定位感兴趣的区域,其中嘴纵横比,眼睛长宽比,和头部姿态特征被提取并馈送到三个不同的分类器:随机森林,序贯神经网络,和线性支持向量机分类器。在国家清华大学驾驶员困倦检测数据集上对拟议系统的评估表明,它可以成功地检测和警告困倦驾驶员,准确率高达99%。
    Drowsiness-related car accidents continue to have a significant effect on road safety. Many of these accidents can be eliminated by alerting the drivers once they start feeling drowsy. This work presents a non-invasive system for real-time driver drowsiness detection using visual features. These features are extracted from videos obtained from a camera installed on the dashboard. The proposed system uses facial landmarks and face mesh detectors to locate the regions of interest where mouth aspect ratio, eye aspect ratio, and head pose features are extracted and fed to three different classifiers: random forest, sequential neural network, and linear support vector machine classifiers. Evaluations of the proposed system over the National Tsing Hua University driver drowsiness detection dataset showed that it can successfully detect and alarm drowsy drivers with an accuracy up to 99%.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    近年来,疫情迫使教育系统从传统教学转向在线教学或混合学习。有效监视远程在线考试的能力是限制教育系统中这一阶段在线评估的可扩展性的因素。人类学习是最常用的方法,可以要求学习者在考试中心参加考试,或者通过视觉监控要求学习者打开相机。然而,这些方法需要大量的劳动力,努力,基础设施,和硬件。本文通过捕获考生的实时视频,提出了一种基于AI的自动化监考系统-“考勤系统”,用于在线评估。我们的细心系统包括四个组件来估计错误,如人脸检测,多人检测,脸欺骗,和头部姿势估计。注意网络检测面并绘制边界框以及置信度。注意网还使用仿射变换的旋转矩阵检查面的对齐。将人脸网算法与Attentive-Net相结合来提取地标和面部特征。通过使用浅CNN活跃度网仅针对对齐的面部启动用于识别欺骗面部的过程。使用SolvePnp方程估计考官的头部姿势,检查他/她是否在寻求他人的帮助。犯罪调查和预防实验室(CIPL)数据集和具有各种类型的不当行为的定制数据集用于评估我们提出的系统。大量的实验结果表明,我们的方法更准确,可靠和健壮的监督系统,可以在实时环境中实际实现为自动监督系统。作者报告了改进的准确度为0.87,结合了注意网,活体网和头部姿势估计。
    In recent years, the pandemic situation has forced the education system to shift from traditional teaching to online teaching or blended learning. The ability to monitor remote online examinations efficiently is a limiting factor to the scalability of this stage of online evaluation in the education system. Human Proctoring is the most used common approach by either asking learners to take a test in the examination centers or by monitoring visually asking learners to switch on their camera. However, these methods require huge labor, effort, infrastructure, and hardware. This paper presents an automated AI-based proctoring system- \'Attentive system\' for online evaluation by capturing the live video of the examinee. Our Attentive system includes four components to estimate the malpractices such as face detection, multiple person detection, face spoofing, and head pose estimation. Attentive Net detects the faces and draws bounding boxes along with confidences. Attentive Net also checks the alignment of the face using the rotation matrix of Affine Transformation. The face net algorithm is combined with Attentive-Net to extract landmarks and facial features. The process for identifying spoofed faces is initiated only for aligned faces by using a shallow CNN Liveness net. The head pose of the examiner is estimated by using the SolvePnp equation, to check if he/she is seeking help from others. Crime Investigation and Prevention Lab (CIPL) datasets and customized datasets with various types of malpractices are used to evaluate our proposed system. Extensive Experimental results demonstrate that our method is more accurate, reliable and robust for proctoring system that can be practically implemented in real time environment as Automated proctoring System. An improved accuracy of 0.87 is reported by authors with the combination of Attentive Net, Liveness net and head pose estimation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    头部姿态估计是计算机视觉中的重要任务之一,预测图像中头部的欧拉角。近年来,用于头部姿态估计的基于CNN的方法已经取得了优异的性能。他们的训练依赖于RGB图像,提供来自RGBD相机的面部标志或深度图像。然而,标记面部标志对于RGB图像中的大角度头部姿势是复杂的,和RGBD摄像机不适合户外场景。针对RGB图像中的头部姿态,提出了一种简单有效的标注方法。新颖性方法使用3D虚拟人头部来模拟RGB图像中的头部姿势。欧拉角可以根据3D虚拟头部的坐标变化来计算。然后,我们使用我们的注释方法创建一个数据集:2DHeadPose数据集,其中包含一组丰富的属性,尺寸,和角度。最后,我们提出高斯标签平滑来抑制注释噪声并反映类间关系。使用高斯标签平滑建立基线方法。实验证明,我们的标注方法,数据集,和高斯标签平滑非常有效。我们的基线方法超越了目前最先进的方法。注释工具,数据集,和源代码公开在https://github.com/youngnuaa/2DHeadPose。
    Head pose estimation is one of the essential tasks in computer vision, which predicts the Euler angles of the head in an image. In recent years, CNN-based methods for head pose estimation have achieved excellent performance. Their training relies on RGB images providing facial landmarks or depth images from RGBD cameras. However, labeling facial landmarks is complex for large angular head poses in RGB images, and RGBD cameras are unsuitable for outdoor scenes. We propose a simple and effective annotation method for the head pose in RGB images. The novelty method uses a 3D virtual human head to simulate the head pose in the RGB image. The Euler angle can be calculated from the change in coordinates of the 3D virtual head. We then create a dataset using our annotation method: 2DHeadPose dataset, which contains a rich set of attributes, dimensions, and angles. Finally, we propose Gaussian label smoothing to suppress annotation noises and reflect inter-class relationships. A baseline approach is established using Gaussian label smoothing. Experiments demonstrate that our annotation method, datasets, and Gaussian label smoothing are very effective. Our baseline approach surpasses most current state-of-the-art methods. The annotation tool, dataset, and source code are publicly available at https://github.com/youngnuaa/2DHeadPose.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    头部姿势评估可以揭示有关人体运动控制的重要临床信息。定量评估有可能客观地评估头部姿势和动作的细节,以监测疾病的进展或治疗的有效性。基于光电摄像机的运动捕捉系统,被公认为临床生物力学的黄金标准,已经被提出用于头部姿势估计。然而,这些系统需要将标记物放置在人的脸上,这对于日常临床实践是不切实际的。此外,对这类设备的有限访问以及在自然环境中评估移动性的新兴趋势支持了能够使用现成传感器估计头部方向的算法的开发,例如RGB相机。虽然人工视觉是一个热门的研究领域,基于适用于临床应用的图像识别的人体姿态估计的有限验证。本文首先简要介绍了文献中可用的头部姿态估计算法。当前最先进的头部姿势算法,旨在从视频中捕获面部几何形状,然后进一步评估和比较OpenFace2.0、MediaPipe和3DDFA_V2。通过将两种方法与基线进行比较来评估准确性,用基于光电相机的运动捕捉系统测量。结果表明,根据运动平面的不同,3DDFA_V2的平均误差小于或等于5.6。而OpenFace2.0和MediaPipe的平均误差达到14.1和11.0。分别。这证明了3DDFA_V2算法在估计头部姿势方面的优越性,在不同的运动方向,并表明该算法可用于临床场景。
    Head pose assessment can reveal important clinical information on human motor control. Quantitative assessment have the potential to objectively evaluate head pose and movements\' specifics, in order to monitor the progression of a disease or the effectiveness of a treatment. Optoelectronic camera-based motion-capture systems, recognized as a gold standard in clinical biomechanics, have been proposed for head pose estimation. However, these systems require markers to be positioned on the person\'s face which is impractical for everyday clinical practice. Furthermore, the limited access to this type of equipment and the emerging trend to assess mobility in natural environments support the development of algorithms capable of estimating head orientation using off-the-shelf sensors, such as RGB cameras. Although artificial vision is a popular field of research, limited validation of human pose estimation based on image recognition suitable for clinical applications has been performed. This paper first provides a brief review of available head pose estimation algorithms in the literature. Current state-of-the-art head pose algorithms designed to capture the facial geometry from videos, OpenFace 2.0, MediaPipe and 3DDFA_V2, are then further evaluated and compared. Accuracy is assessed by comparing both approaches to a baseline, measured with an optoelectronic camera-based motion-capture system. Results reveal a mean error lower or equal to 5.6∘ for 3DDFA_V2 depending on the plane of movement, while the mean error reaches 14.1∘ and 11.0∘ for OpenFace 2.0 and MediaPipe, respectively. This demonstrates the superiority of the 3DDFA_V2 algorithm in estimating head pose, in different directions of motion, and suggests that this algorithm can be used in clinical scenarios.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号