head pose estimation

头部姿势估计
  • 文章类型: Journal Article
    通过基准标记对头部姿势的光学跟踪已经被证明能够在磁共振成像期间有效地校正大脑中的运动伪影,但是由于冗长的校准和设置时间而仍然难以在临床中实现。由于运动校正所需的亚毫米空间分辨率,用于无标记头部姿势估计的深度学习的进展尚未应用于该问题。在目前的工作中,描述了两个用于开发和训练神经网络的光学跟踪系统:一个基于标记的系统(用于测量地面真相头部姿势的测试平台)具有高跟踪保真度作为训练标签,和一个无标记的基于深度学习的系统,使用无标记的头部的图像作为网络的输入。无标记系统有可能克服标记物遮挡的问题,标记的刚性连接不足,冗长的校准时间,以及跨自由度(DOF)的不平等性能,所有这些都阻碍了在临床中采用基于标记的解决方案。提供了有关用作地面实况的自定义莫尔增强基准标记的开发以及两个光学跟踪系统的校准程序的详细信息。此外,描述了合成头部姿态数据集的开发,以证明简单卷积神经网络的概念和初始预训练。结果表明,地面实况系统已得到充分校准,可以跟踪磁头姿态,误差<1mm和<1°。跟踪健康的数据,成人参与者显示。预训练结果表明,在训练数据集包含和排除的头部模型上,6个自由度的平均均方根误差为0.13和0.36(mm或度)。分别。总的来说,这项工作表明了基于深度学习的方法的出色可行性,并将使未来的工作能够在MRI环境中对真实数据集进行培训和测试。
    Optical tracking of head pose via fiducial markers has been proven to enable effective correction of motion artifacts in the brain during magnetic resonance imaging but remains difficult to implement in the clinic due to lengthy calibration and set up times. Advances in deep learning for markerless head pose estimation have yet to be applied to this problem because of the sub-millimetre spatial resolution required for motion correction. In the present work, two optical tracking systems are described for the development and training of a neural network: one marker-based system (a testing platform for measuring ground truth head pose) with high tracking fidelity to act as the training labels, and one markerless deep-learning-based system using images of the markerless head as input to the network. The markerless system has the potential to overcome issues of marker occlusion, insufficient rigid attachment of the marker, lengthy calibration times, and unequal performance across degrees of freedom (DOF), all of which hamper the adoption of marker-based solutions in the clinic. Detail is provided on the development of a custom moiré-enhanced fiducial marker for use as ground truth and on the calibration procedure for both optical tracking systems. Additionally, the development of a synthetic head pose dataset is described for the proof of concept and initial pre-training of a simple convolutional neural network. Results indicate that the ground truth system has been sufficiently calibrated and can track head pose with an error of <1 mm and <1°. Tracking data of a healthy, adult participant are shown. Pre-training results show that the average root-mean-squared error across the 6 DOF is 0.13 and 0.36 (mm or degrees) on a head model included and excluded from the training dataset, respectively. Overall, this work indicates excellent feasibility of the deep-learning-based approach and will enable future work in training and testing on a real dataset in the MRI environment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    大多数面部分析方法在标准化测试中表现良好,但在实际测试中表现不佳。主要原因是训练模型无法轻松学习各种人类特征和背景噪音,特别是对于面部标志检测和头部姿态估计任务与有限和嘈杂的训练数据集。为了缓解标准化测试和真实世界测试之间的差距,我们提出了一种伪标记技术,使用由各种人和背景噪声组成的面部识别数据集。使用我们的伪标记训练数据集可以帮助克服数据集中人之间缺乏多样性的问题。我们的集成框架是使用互补的多任务学习方法构建的,可以为每个任务提取健壮的特征。此外,引入伪标记和多任务学习通过实现姿态不变特征的学习来提高人脸识别性能。我们的方法在AFLW2000-3D和BIWI数据集上实现了最先进的(SOTA)或接近SOTA的性能,用于面部标志检测和头部姿势估计,在用于人脸识别的IJB-C测试数据集上具有竞争力的人脸验证性能。我们通过一种新颖的测试方法来证明这一点,该方法将案例分类为软,中等,并且很难基于IJB-C的位姿值。即使在数据集缺乏不同的人脸识别时,该方法也能实现稳定的性能。
    Most facial analysis methods perform well in standardized testing but not in real-world testing. The main reason is that training models cannot easily learn various human features and background noise, especially for facial landmark detection and head pose estimation tasks with limited and noisy training datasets. To alleviate the gap between standardized and real-world testing, we propose a pseudo-labeling technique using a face recognition dataset consisting of various people and background noise. The use of our pseudo-labeled training dataset can help to overcome the lack of diversity among the people in the dataset. Our integrated framework is constructed using complementary multitask learning methods to extract robust features for each task. Furthermore, introducing pseudo-labeling and multitask learning improves the face recognition performance by enabling the learning of pose-invariant features. Our method achieves state-of-the-art (SOTA) or near-SOTA performance on the AFLW2000-3D and BIWI datasets for facial landmark detection and head pose estimation, with competitive face verification performance on the IJB-C test dataset for face recognition. We demonstrate this through a novel testing methodology that categorizes cases as soft, medium, and hard based on the pose values of IJB-C. The proposed method achieves stable performance even when the dataset lacks diverse face identifications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在本文中,提出了一种基于人脸检测和头部姿态估计的学习状态评价方法。该方法适用于计算能力较弱的移动设备,因此有必要对人脸检测和头部姿态估计网络的参数进行控制。首先,我们提出了一个鬼影和注意力模块(GA)基本人脸检测网络(GA-Face)。GA-Face通过鬼影模块减少了特征提取网络中的参数数量和计算量,并通过无参数的注意力机制将网络集中在重要特征上。我们还提出了一个轻量级的双分支(DB)头部姿态估计网络:DB-Net。最后,提出了一种学生学习状态评价算法。该算法可以根据学生的面部与屏幕之间的距离来评估学生的学习状态,以及他们的头部姿势。我们在几个标准人脸检测数据集和标准头部姿态估计数据集上验证了所提出的GA-Face和DB-Net的有效性。最后,我们验证,通过实际案例,认为所提出的在线学习状态评估方法可以有效地评估学生的注意力和专注度,and,由于其计算复杂度低,不会干扰学生的学习过程。
    In this paper, we propose a learning state evaluation method based on face detection and head pose estimation. This method is suitable for mobile devices with weak computing power, so it is necessary to control the parameter quantity of the face detection and head pose estimation network. Firstly, we propose a ghost and attention module (GA) base face detection network (GA-Face). GA-Face reduces the number of parameters and computation in the feature extraction network through the ghost module, and focuses the network on important features through a parameter-free attention mechanism. We also propose a lightweight dual-branch (DB) head pose estimation network: DB-Net. Finally, we propose a student learning state evaluation algorithm. This algorithm can evaluate the learning status of students based on the distance between their faces and the screen, as well as their head posture. We validate the effectiveness of the proposed GA-Face and DB-Net on several standard face detection datasets and standard head pose estimation datasets. Finally, we validate, through practical cases, that the proposed online learning state assessment method can effectively assess the level of student attention and concentration, and, due to its low computational complexity, will not interfere with the student\'s learning process.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    头部姿势估计服务于各种应用,比如凝视估计,疲劳驱动检测,和虚拟现实。尽管如此,由于对单一数据源的依赖,实现精确和有效的预测仍然具有挑战性。因此,这项研究介绍了一种涉及多模态特征融合的技术,以提高头部姿态估计的准确性。所提出的方法合并了来自不同来源的数据,包括RGB和深度图像,为了构建头部的全面三维表示,通常称为点云。该方法值得注意的创新包括PointNet中的剩余多层感知器结构,旨在应对与梯度相关的挑战,以及旨在降低噪声的空间自我注意机制。增强的PointNet和ResNet网络用于从点云和图像中提取特征。这些提取的特征经历融合。此外,评分模块的合并增强了鲁棒性,特别是在涉及面部遮挡的场景中。这是通过保留得分最高的点云中的特征来实现的。此外,采用预测模块,结合分类和回归方法来准确估计头部姿势。该方法提高了头部姿态估计的准确性和鲁棒性,尤其是涉及面部阻塞的病例。这些进步通过使用BIWI数据集进行的实验得到了证实,证明了该方法相对于现有技术的优越性。
    Head pose estimation serves various applications, such as gaze estimation, fatigue-driven detection, and virtual reality. Nonetheless, achieving precise and efficient predictions remains challenging owing to the reliance on singular data sources. Therefore, this study introduces a technique involving multimodal feature fusion to elevate head pose estimation accuracy. The proposed method amalgamates data derived from diverse sources, including RGB and depth images, to construct a comprehensive three-dimensional representation of the head, commonly referred to as a point cloud. The noteworthy innovations of this method encompass a residual multilayer perceptron structure within PointNet, designed to tackle gradient-related challenges, along with spatial self-attention mechanisms aimed at noise reduction. The enhanced PointNet and ResNet networks are utilized to extract features from both point clouds and images. These extracted features undergo fusion. Furthermore, the incorporation of a scoring module strengthens robustness, particularly in scenarios involving facial occlusion. This is achieved by preserving features from the highest-scoring point cloud. Additionally, a prediction module is employed, combining classification and regression methodologies to accurately estimate head poses. The proposed method improves the accuracy and robustness of head pose estimation, especially in cases involving facial obstructions. These advancements are substantiated by experiments conducted using the BIWI dataset, demonstrating the superiority of this method over existing techniques.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    头部姿态估计是分析人体行为的重要技术,在人机交互、疲劳检测等领域得到了广泛的研究和应用。然而,传统的头部姿态估计网络容易丢失空间结构信息,特别是在遮挡和多对象检测常见的复杂场景中,导致精度低。为了解决上述问题,提出了一种基于残差网络和胶囊网络的头部姿态估计模型。首先,深度残差网络用于从三个阶段提取特征,捕获不同层次的空间结构信息,并采用全局注意块来增强特征提取的空间权重。为有效避免空间结构信息的丢失,使用改进的胶囊网络对特征进行编码并传输到输出,通过自注意路由机制增强了其泛化能力。为了增强模型的鲁棒性,我们优化Huber损失,它首先用于头部姿势估计。最后,实验是在三个流行的公共数据集上进行的,300W-LP,AFLW2000和BIWI。结果表明,所提出的方法达到了最先进的结果,特别是在有遮挡的情况下。
    Head pose estimation is an important technology for analyzing human behavior and has been widely researched and applied in areas such as human-computer interaction and fatigue detection. However, traditional head pose estimation networks suffer from the problem of easily losing spatial structure information, particularly in complex scenarios where occlusions and multiple object detections are common, resulting in low accuracy. To address the above issues, we propose a head pose estimation model based on the residual network and capsule network. Firstly, a deep residual network is used to extract features from three stages, capturing spatial structure information at different levels, and a global attention block is employed to enhance the spatial weight of feature extraction. To effectively avoid the loss of spatial structure information, the features are encoded and transmitted to the output using an improved capsule network, which is enhanced in its generalization ability through self-attention routing mechanisms. To enhance the robustness of the model, we optimize Huber loss, which is first used in head pose estimation. Finally, experiments are conducted on three popular public datasets, 300W-LP, AFLW2000, and BIWI. The results demonstrate that the proposed method achieves state-of-the-art results, particularly in scenarios with occlusions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    与困倦有关的车祸继续对道路安全产生重大影响。一旦驾驶员开始感到困倦,就可以通过警告驾驶员来消除其中的许多事故。这项工作提出了一种使用视觉特征进行实时驾驶员困倦检测的非侵入性系统。这些特征是从从安装在仪表板上的摄像机获得的视频中提取的。所提出的系统使用面部标志和面部网格检测器来定位感兴趣的区域,其中嘴纵横比,眼睛长宽比,和头部姿态特征被提取并馈送到三个不同的分类器:随机森林,序贯神经网络,和线性支持向量机分类器。在国家清华大学驾驶员困倦检测数据集上对拟议系统的评估表明,它可以成功地检测和警告困倦驾驶员,准确率高达99%。
    Drowsiness-related car accidents continue to have a significant effect on road safety. Many of these accidents can be eliminated by alerting the drivers once they start feeling drowsy. This work presents a non-invasive system for real-time driver drowsiness detection using visual features. These features are extracted from videos obtained from a camera installed on the dashboard. The proposed system uses facial landmarks and face mesh detectors to locate the regions of interest where mouth aspect ratio, eye aspect ratio, and head pose features are extracted and fed to three different classifiers: random forest, sequential neural network, and linear support vector machine classifiers. Evaluations of the proposed system over the National Tsing Hua University driver drowsiness detection dataset showed that it can successfully detect and alarm drowsy drivers with an accuracy up to 99%.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    近年来,疫情迫使教育系统从传统教学转向在线教学或混合学习。有效监视远程在线考试的能力是限制教育系统中这一阶段在线评估的可扩展性的因素。人类学习是最常用的方法,可以要求学习者在考试中心参加考试,或者通过视觉监控要求学习者打开相机。然而,这些方法需要大量的劳动力,努力,基础设施,和硬件。本文通过捕获考生的实时视频,提出了一种基于AI的自动化监考系统-“考勤系统”,用于在线评估。我们的细心系统包括四个组件来估计错误,如人脸检测,多人检测,脸欺骗,和头部姿势估计。注意网络检测面并绘制边界框以及置信度。注意网还使用仿射变换的旋转矩阵检查面的对齐。将人脸网算法与Attentive-Net相结合来提取地标和面部特征。通过使用浅CNN活跃度网仅针对对齐的面部启动用于识别欺骗面部的过程。使用SolvePnp方程估计考官的头部姿势,检查他/她是否在寻求他人的帮助。犯罪调查和预防实验室(CIPL)数据集和具有各种类型的不当行为的定制数据集用于评估我们提出的系统。大量的实验结果表明,我们的方法更准确,可靠和健壮的监督系统,可以在实时环境中实际实现为自动监督系统。作者报告了改进的准确度为0.87,结合了注意网,活体网和头部姿势估计。
    In recent years, the pandemic situation has forced the education system to shift from traditional teaching to online teaching or blended learning. The ability to monitor remote online examinations efficiently is a limiting factor to the scalability of this stage of online evaluation in the education system. Human Proctoring is the most used common approach by either asking learners to take a test in the examination centers or by monitoring visually asking learners to switch on their camera. However, these methods require huge labor, effort, infrastructure, and hardware. This paper presents an automated AI-based proctoring system- \'Attentive system\' for online evaluation by capturing the live video of the examinee. Our Attentive system includes four components to estimate the malpractices such as face detection, multiple person detection, face spoofing, and head pose estimation. Attentive Net detects the faces and draws bounding boxes along with confidences. Attentive Net also checks the alignment of the face using the rotation matrix of Affine Transformation. The face net algorithm is combined with Attentive-Net to extract landmarks and facial features. The process for identifying spoofed faces is initiated only for aligned faces by using a shallow CNN Liveness net. The head pose of the examiner is estimated by using the SolvePnp equation, to check if he/she is seeking help from others. Crime Investigation and Prevention Lab (CIPL) datasets and customized datasets with various types of malpractices are used to evaluate our proposed system. Extensive Experimental results demonstrate that our method is more accurate, reliable and robust for proctoring system that can be practically implemented in real time environment as Automated proctoring System. An improved accuracy of 0.87 is reported by authors with the combination of Attentive Net, Liveness net and head pose estimation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    人脸识别在许多人机交互技术和应用中发挥着重要作用,其门禁系统基于面部生物特征的验证。虽然识别性能已经取得了很大的进步,当在某些特定条件下,比如有遮挡的面孔,性能将遭受严重下降。遮挡是现有通用人脸识别系统性能下降的最重要原因之一。遮挡面部识别(OFR)中的最大问题在于缺少遮挡面部数据。为了缓解这个问题,本文提出了一种新的OFR网络DOMG-OFR(基于遮挡人脸识别的动态遮挡掩码生成器),它不断尝试在特征级别上动态生成信息量最大的遮挡面部训练样本,这样,识别模型总是被输入最有价值的训练样本,从而节省了准备合成数据的劳动,同时提高了训练效率。此外,本文还提出了一个称为决策模块(DM)的新模块,试图结合OFR中两种主流方法的优点,即基于人脸图像重建的方法和基于人脸特征滤波的方法。此外,为了使现有的面部去遮挡方法主要针对近正面面部,能够在大姿势下的面部上很好地工作,提出了一种基于条件生成对抗网络(CGAN)的头部姿态感知去遮挡管道。在实验部分,我们还研究了遮挡对面部识别性能的影响,并充分证明了我们提出的基于决策的OFR管道的有效性和有效性。通过将真实遮挡人脸数据集和合成遮挡人脸数据集上的验证和识别性能与其他现有作品进行比较,我们提出的OFR架构与其他作品相比具有明显的优势。
    Face recognition plays the significant role in many human-computer interaction decvices and applications, whose access control systems are based on the verification of face biometrical features. Though great improvement in the recognition performances have been achieved, when under some specific conditions like faces with occlusions, the performance would suffer a severe drop. Occlusion is one of the most significant reasons for the performance degrade of the existing general face recognition systems. The biggest problem in occluded face recognition (OFR) lies in the lack of the occluded face data. To mitigate this problem, this paper has proposed one new OFR network DOMG-OFR (Dynamic Occlusion Mask Generator based Occluded Face Recognition), which keeps trying to generate the most informative occluded face training samples on feature level dynamically, in this way, the recognition model would always be fed with the most valuable training samples so as to save the labor in preparing the synthetic data while simultaneously improving the training efficiency. Besides, this paper also proposes one new module called Decision Module (DM) in an attempt to combine both the merits of the two mainstream methodologies in OFR which are face image reconstruction based methodologies and the face feature filtering based methodologies. Furthermore, to enable the existing face deocclusion methods that mostly target at near frontal faces to work well on faces under large poses, one head pose aware deocclusion pipeline based on the Condition Generative Adversarial Network (CGAN) is proposed. In the experimental parts, we have also investigated the effects of the occlusions upon face recognition performance, and the validity and the efficiency of our proposed Decision based OFR pipeline has been fully proved. Through comparing both the verification and the recognition performance upon both the real occluded face datasets and the synthetic occluded face datasets with other existing works, our proposed OFR architecture has demonstrated obvious advantages over other works.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    主流方法将头部姿势估计视为监督分类/回归问题,其性能在很大程度上取决于训练数据的地面实况标签的准确性。然而,在实践中很难获得准确的头部姿势标签,由于缺乏有效的设备和合理的头部姿势标签方法。在本文中,我们提出了一种不需要用头部姿势标签训练的方法,但匹配重建的3D人脸模型和2D输入图像之间的关键点,用于头部姿势估计。所提出的头部姿态估计方法由两个部分组成:3D人脸重建和3D-2D匹配关键点。在3D人脸重建阶段,使用卷积神经网络从输入的头部图像重建个性化的3D人脸模型,通过非对称欧几里德损失和关键点损失共同优化。在3D-2D关键点匹配阶段,提出了一种迭代优化算法,在透视变换的约束下,有效地匹配重建的三维人脸模型和二维输入图像之间的关键点。所提出的方法在五个广泛使用的头部姿势估计数据集上进行了广泛的评估,包括指向\'04,BIWI,AFLW2000,多PIE,潘多拉。实验结果表明,该方法具有良好的跨数据集性能,超越了现有的大多数最先进的方法,指向\'04的平均MAE为4.78〇,BIWI的平均MAE为6.83〇,AFLW2000上的7.05○,Multi-PIE上的5.47○,和潘多拉5.06○,尽管所提出的方法的模型没有在这五个数据集中的任何一个上进行训练。
    Mainstream methods treat head pose estimation as a supervised classification/regression problem, whose performance heavily depends on the accuracy of ground-truth labels of training data. However, it is rather difficult to obtain accurate head pose labels in practice, due to the lack of effective equipment and reasonable approaches for head pose labeling. In this paper, we propose a method which does not need to be trained with head pose labels, but matches the keypoints between a reconstructed 3D face model and the 2D input image, for head pose estimation. The proposed head pose estimation method consists of two components: the 3D face reconstruction and the 3D-2D matching keypoints. At the 3D face reconstruction phase, a personalized 3D face model is reconstructed from the input head image using convolutional neural networks, which are jointly optimized by an asymmetric Euclidean loss and a keypoint loss. At the 3D-2D keypoints matching phase, an iterative optimization algorithm is proposed to match the keypoints between the reconstructed 3D face model and the 2D input image efficiently under the constraint of perspective transformation. The proposed method is extensively evaluated on five widely used head pose estimation datasets, including Pointing\'04, BIWI, AFLW2000, Multi-PIE, and Pandora. The experimental results demonstrate that the proposed method achieves excellent cross-dataset performance and surpasses most of the existing state-of-the-art approaches, with average MAEs of 4.78∘ on Pointing\'04, 6.83∘ on BIWI, 7.05∘ on AFLW2000, 5.47∘ on Multi-PIE, and 5.06∘ on Pandora, although the model of the proposed method is not trained on any of these five datasets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    准确的人脸分割大大有利于人脸图像分析问题。在本文中,我们提出了通过端到端语义人脸分割进行人脸图像分析的统一框架。所提出的框架包含一组用于面部理解的堆栈组件,其中包括头部姿势估计,年龄分类,和性别认同。手动标记的面部数据集用于训练基于条件随机场(CRF)的分割模型。通过CRF开发的多类别面部分割框架将面部图像分割为六个部分。采用概率分类策略,并为每个类生成概率图。概率图用作特征描述符,并为每个任务(头部姿势,年龄,和性别)。我们在几个数据集上评估拟议框架的性能,并报告与先前报告的结果相比更好的结果。
    Accurate face segmentation strongly benefits the human face image analysis problem. In this paper we propose a unified framework for face image analysis through end-to-end semantic face segmentation. The proposed framework contains a set of stack components for face understanding, which includes head pose estimation, age classification, and gender recognition. A manually labeled face data-set is used for training the Conditional Random Fields (CRFs) based segmentation model. A multi-class face segmentation framework developed through CRFs segments a facial image into six parts. The probabilistic classification strategy is used, and probability maps are generated for each class. The probability maps are used as features descriptors and a Random Decision Forest (RDF) classifier is modeled for each task (head pose, age, and gender). We assess the performance of the proposed framework on several data-sets and report better results as compared to the previously reported results.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号