head pose estimation

头部姿势估计
  • 文章类型: Journal Article
    在本文中,提出了一种基于人脸检测和头部姿态估计的学习状态评价方法。该方法适用于计算能力较弱的移动设备,因此有必要对人脸检测和头部姿态估计网络的参数进行控制。首先,我们提出了一个鬼影和注意力模块(GA)基本人脸检测网络(GA-Face)。GA-Face通过鬼影模块减少了特征提取网络中的参数数量和计算量,并通过无参数的注意力机制将网络集中在重要特征上。我们还提出了一个轻量级的双分支(DB)头部姿态估计网络:DB-Net。最后,提出了一种学生学习状态评价算法。该算法可以根据学生的面部与屏幕之间的距离来评估学生的学习状态,以及他们的头部姿势。我们在几个标准人脸检测数据集和标准头部姿态估计数据集上验证了所提出的GA-Face和DB-Net的有效性。最后,我们验证,通过实际案例,认为所提出的在线学习状态评估方法可以有效地评估学生的注意力和专注度,and,由于其计算复杂度低,不会干扰学生的学习过程。
    In this paper, we propose a learning state evaluation method based on face detection and head pose estimation. This method is suitable for mobile devices with weak computing power, so it is necessary to control the parameter quantity of the face detection and head pose estimation network. Firstly, we propose a ghost and attention module (GA) base face detection network (GA-Face). GA-Face reduces the number of parameters and computation in the feature extraction network through the ghost module, and focuses the network on important features through a parameter-free attention mechanism. We also propose a lightweight dual-branch (DB) head pose estimation network: DB-Net. Finally, we propose a student learning state evaluation algorithm. This algorithm can evaluate the learning status of students based on the distance between their faces and the screen, as well as their head posture. We validate the effectiveness of the proposed GA-Face and DB-Net on several standard face detection datasets and standard head pose estimation datasets. Finally, we validate, through practical cases, that the proposed online learning state assessment method can effectively assess the level of student attention and concentration, and, due to its low computational complexity, will not interfere with the student\'s learning process.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    头部姿势估计服务于各种应用,比如凝视估计,疲劳驱动检测,和虚拟现实。尽管如此,由于对单一数据源的依赖,实现精确和有效的预测仍然具有挑战性。因此,这项研究介绍了一种涉及多模态特征融合的技术,以提高头部姿态估计的准确性。所提出的方法合并了来自不同来源的数据,包括RGB和深度图像,为了构建头部的全面三维表示,通常称为点云。该方法值得注意的创新包括PointNet中的剩余多层感知器结构,旨在应对与梯度相关的挑战,以及旨在降低噪声的空间自我注意机制。增强的PointNet和ResNet网络用于从点云和图像中提取特征。这些提取的特征经历融合。此外,评分模块的合并增强了鲁棒性,特别是在涉及面部遮挡的场景中。这是通过保留得分最高的点云中的特征来实现的。此外,采用预测模块,结合分类和回归方法来准确估计头部姿势。该方法提高了头部姿态估计的准确性和鲁棒性,尤其是涉及面部阻塞的病例。这些进步通过使用BIWI数据集进行的实验得到了证实,证明了该方法相对于现有技术的优越性。
    Head pose estimation serves various applications, such as gaze estimation, fatigue-driven detection, and virtual reality. Nonetheless, achieving precise and efficient predictions remains challenging owing to the reliance on singular data sources. Therefore, this study introduces a technique involving multimodal feature fusion to elevate head pose estimation accuracy. The proposed method amalgamates data derived from diverse sources, including RGB and depth images, to construct a comprehensive three-dimensional representation of the head, commonly referred to as a point cloud. The noteworthy innovations of this method encompass a residual multilayer perceptron structure within PointNet, designed to tackle gradient-related challenges, along with spatial self-attention mechanisms aimed at noise reduction. The enhanced PointNet and ResNet networks are utilized to extract features from both point clouds and images. These extracted features undergo fusion. Furthermore, the incorporation of a scoring module strengthens robustness, particularly in scenarios involving facial occlusion. This is achieved by preserving features from the highest-scoring point cloud. Additionally, a prediction module is employed, combining classification and regression methodologies to accurately estimate head poses. The proposed method improves the accuracy and robustness of head pose estimation, especially in cases involving facial obstructions. These advancements are substantiated by experiments conducted using the BIWI dataset, demonstrating the superiority of this method over existing techniques.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    头部姿态估计是分析人体行为的重要技术,在人机交互、疲劳检测等领域得到了广泛的研究和应用。然而,传统的头部姿态估计网络容易丢失空间结构信息,特别是在遮挡和多对象检测常见的复杂场景中,导致精度低。为了解决上述问题,提出了一种基于残差网络和胶囊网络的头部姿态估计模型。首先,深度残差网络用于从三个阶段提取特征,捕获不同层次的空间结构信息,并采用全局注意块来增强特征提取的空间权重。为有效避免空间结构信息的丢失,使用改进的胶囊网络对特征进行编码并传输到输出,通过自注意路由机制增强了其泛化能力。为了增强模型的鲁棒性,我们优化Huber损失,它首先用于头部姿势估计。最后,实验是在三个流行的公共数据集上进行的,300W-LP,AFLW2000和BIWI。结果表明,所提出的方法达到了最先进的结果,特别是在有遮挡的情况下。
    Head pose estimation is an important technology for analyzing human behavior and has been widely researched and applied in areas such as human-computer interaction and fatigue detection. However, traditional head pose estimation networks suffer from the problem of easily losing spatial structure information, particularly in complex scenarios where occlusions and multiple object detections are common, resulting in low accuracy. To address the above issues, we propose a head pose estimation model based on the residual network and capsule network. Firstly, a deep residual network is used to extract features from three stages, capturing spatial structure information at different levels, and a global attention block is employed to enhance the spatial weight of feature extraction. To effectively avoid the loss of spatial structure information, the features are encoded and transmitted to the output using an improved capsule network, which is enhanced in its generalization ability through self-attention routing mechanisms. To enhance the robustness of the model, we optimize Huber loss, which is first used in head pose estimation. Finally, experiments are conducted on three popular public datasets, 300W-LP, AFLW2000, and BIWI. The results demonstrate that the proposed method achieves state-of-the-art results, particularly in scenarios with occlusions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    头部姿态估计是计算机视觉中的重要任务之一,预测图像中头部的欧拉角。近年来,用于头部姿态估计的基于CNN的方法已经取得了优异的性能。他们的训练依赖于RGB图像,提供来自RGBD相机的面部标志或深度图像。然而,标记面部标志对于RGB图像中的大角度头部姿势是复杂的,和RGBD摄像机不适合户外场景。针对RGB图像中的头部姿态,提出了一种简单有效的标注方法。新颖性方法使用3D虚拟人头部来模拟RGB图像中的头部姿势。欧拉角可以根据3D虚拟头部的坐标变化来计算。然后,我们使用我们的注释方法创建一个数据集:2DHeadPose数据集,其中包含一组丰富的属性,尺寸,和角度。最后,我们提出高斯标签平滑来抑制注释噪声并反映类间关系。使用高斯标签平滑建立基线方法。实验证明,我们的标注方法,数据集,和高斯标签平滑非常有效。我们的基线方法超越了目前最先进的方法。注释工具,数据集,和源代码公开在https://github.com/youngnuaa/2DHeadPose。
    Head pose estimation is one of the essential tasks in computer vision, which predicts the Euler angles of the head in an image. In recent years, CNN-based methods for head pose estimation have achieved excellent performance. Their training relies on RGB images providing facial landmarks or depth images from RGBD cameras. However, labeling facial landmarks is complex for large angular head poses in RGB images, and RGBD cameras are unsuitable for outdoor scenes. We propose a simple and effective annotation method for the head pose in RGB images. The novelty method uses a 3D virtual human head to simulate the head pose in the RGB image. The Euler angle can be calculated from the change in coordinates of the 3D virtual head. We then create a dataset using our annotation method: 2DHeadPose dataset, which contains a rich set of attributes, dimensions, and angles. Finally, we propose Gaussian label smoothing to suppress annotation noises and reflect inter-class relationships. A baseline approach is established using Gaussian label smoothing. Experiments demonstrate that our annotation method, datasets, and Gaussian label smoothing are very effective. Our baseline approach surpasses most current state-of-the-art methods. The annotation tool, dataset, and source code are publicly available at https://github.com/youngnuaa/2DHeadPose.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    作为计算机视觉中的一项重要任务,头部姿态估计在学术界和工业界都有广泛的应用。然而,在头部姿势估计领域仍然存在两个挑战:(1)即使给定相同的任务(例如,疲劳检测),现有的算法通常考虑三个角度的估计(即,roll,偏航,和间距)作为单独的方面,忽略它们的相互作用以及差异,因此所有层共享相同的参数;(2)角度估计的不连续性肯定会降低准确性。为了解决这两个问题,本研究提出了一种THESL-Net(具有自调整损耗网络的分层头部姿态估计)模型。具体来说,首先,提出了一种使用不同网络层的阶梯式估计的想法,在角度估计期间获得更大的自由度。此外,揭示了角度估计不连续的原因,不仅包括用四元数或欧拉角标记数据集,而且损失函数简单地增加了分类和回归损失。随后,对损失函数施加自调整约束,使角度估计更加一致。最后,为了检查不同角度范围对所提出的模型的影响,实验是在三个流行的公共基准数据集上进行的,BIWI,AFLW2000和UPNA,证明所提出的模型优于最先进的方法。
    As an important task in computer vision, head pose estimation has been widely applied in both academia and industry. However, there remains two challenges in the field of head pose estimation: (1) even given the same task (e.g., tiredness detection), the existing algorithms usually consider the estimation of the three angles (i.e., roll, yaw, and pitch) as separate facets, which disregard their interplay as well as differences and thus share the same parameters for all layers; and (2) the discontinuity in angle estimation definitely reduces the accuracy. To solve these two problems, a THESL-Net (tiered head pose estimation with self-adjust loss network) model is proposed in this study. Specifically, first, an idea of stepped estimation using distinct network layers is proposed, gaining a greater freedom during angle estimation. Furthermore, the reasons for the discontinuity in angle estimation are revealed, including not only labeling the dataset with quaternions or Euler angles, but also the loss function that simply adds the classification and regression losses. Subsequently, a self-adjustment constraint on the loss function is applied, making the angle estimation more consistent. Finally, to examine the influence of different angle ranges on the proposed model, experiments are conducted on three popular public benchmark datasets, BIWI, AFLW2000, and UPNA, demonstrating that the proposed model outperforms the state-of-the-art approaches.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    人脸识别在许多人机交互技术和应用中发挥着重要作用,其门禁系统基于面部生物特征的验证。虽然识别性能已经取得了很大的进步,当在某些特定条件下,比如有遮挡的面孔,性能将遭受严重下降。遮挡是现有通用人脸识别系统性能下降的最重要原因之一。遮挡面部识别(OFR)中的最大问题在于缺少遮挡面部数据。为了缓解这个问题,本文提出了一种新的OFR网络DOMG-OFR(基于遮挡人脸识别的动态遮挡掩码生成器),它不断尝试在特征级别上动态生成信息量最大的遮挡面部训练样本,这样,识别模型总是被输入最有价值的训练样本,从而节省了准备合成数据的劳动,同时提高了训练效率。此外,本文还提出了一个称为决策模块(DM)的新模块,试图结合OFR中两种主流方法的优点,即基于人脸图像重建的方法和基于人脸特征滤波的方法。此外,为了使现有的面部去遮挡方法主要针对近正面面部,能够在大姿势下的面部上很好地工作,提出了一种基于条件生成对抗网络(CGAN)的头部姿态感知去遮挡管道。在实验部分,我们还研究了遮挡对面部识别性能的影响,并充分证明了我们提出的基于决策的OFR管道的有效性和有效性。通过将真实遮挡人脸数据集和合成遮挡人脸数据集上的验证和识别性能与其他现有作品进行比较,我们提出的OFR架构与其他作品相比具有明显的优势。
    Face recognition plays the significant role in many human-computer interaction decvices and applications, whose access control systems are based on the verification of face biometrical features. Though great improvement in the recognition performances have been achieved, when under some specific conditions like faces with occlusions, the performance would suffer a severe drop. Occlusion is one of the most significant reasons for the performance degrade of the existing general face recognition systems. The biggest problem in occluded face recognition (OFR) lies in the lack of the occluded face data. To mitigate this problem, this paper has proposed one new OFR network DOMG-OFR (Dynamic Occlusion Mask Generator based Occluded Face Recognition), which keeps trying to generate the most informative occluded face training samples on feature level dynamically, in this way, the recognition model would always be fed with the most valuable training samples so as to save the labor in preparing the synthetic data while simultaneously improving the training efficiency. Besides, this paper also proposes one new module called Decision Module (DM) in an attempt to combine both the merits of the two mainstream methodologies in OFR which are face image reconstruction based methodologies and the face feature filtering based methodologies. Furthermore, to enable the existing face deocclusion methods that mostly target at near frontal faces to work well on faces under large poses, one head pose aware deocclusion pipeline based on the Condition Generative Adversarial Network (CGAN) is proposed. In the experimental parts, we have also investigated the effects of the occlusions upon face recognition performance, and the validity and the efficiency of our proposed Decision based OFR pipeline has been fully proved. Through comparing both the verification and the recognition performance upon both the real occluded face datasets and the synthetic occluded face datasets with other existing works, our proposed OFR architecture has demonstrated obvious advantages over other works.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    主流方法将头部姿势估计视为监督分类/回归问题,其性能在很大程度上取决于训练数据的地面实况标签的准确性。然而,在实践中很难获得准确的头部姿势标签,由于缺乏有效的设备和合理的头部姿势标签方法。在本文中,我们提出了一种不需要用头部姿势标签训练的方法,但匹配重建的3D人脸模型和2D输入图像之间的关键点,用于头部姿势估计。所提出的头部姿态估计方法由两个部分组成:3D人脸重建和3D-2D匹配关键点。在3D人脸重建阶段,使用卷积神经网络从输入的头部图像重建个性化的3D人脸模型,通过非对称欧几里德损失和关键点损失共同优化。在3D-2D关键点匹配阶段,提出了一种迭代优化算法,在透视变换的约束下,有效地匹配重建的三维人脸模型和二维输入图像之间的关键点。所提出的方法在五个广泛使用的头部姿势估计数据集上进行了广泛的评估,包括指向\'04,BIWI,AFLW2000,多PIE,潘多拉。实验结果表明,该方法具有良好的跨数据集性能,超越了现有的大多数最先进的方法,指向\'04的平均MAE为4.78〇,BIWI的平均MAE为6.83〇,AFLW2000上的7.05○,Multi-PIE上的5.47○,和潘多拉5.06○,尽管所提出的方法的模型没有在这五个数据集中的任何一个上进行训练。
    Mainstream methods treat head pose estimation as a supervised classification/regression problem, whose performance heavily depends on the accuracy of ground-truth labels of training data. However, it is rather difficult to obtain accurate head pose labels in practice, due to the lack of effective equipment and reasonable approaches for head pose labeling. In this paper, we propose a method which does not need to be trained with head pose labels, but matches the keypoints between a reconstructed 3D face model and the 2D input image, for head pose estimation. The proposed head pose estimation method consists of two components: the 3D face reconstruction and the 3D-2D matching keypoints. At the 3D face reconstruction phase, a personalized 3D face model is reconstructed from the input head image using convolutional neural networks, which are jointly optimized by an asymmetric Euclidean loss and a keypoint loss. At the 3D-2D keypoints matching phase, an iterative optimization algorithm is proposed to match the keypoints between the reconstructed 3D face model and the 2D input image efficiently under the constraint of perspective transformation. The proposed method is extensively evaluated on five widely used head pose estimation datasets, including Pointing\'04, BIWI, AFLW2000, Multi-PIE, and Pandora. The experimental results demonstrate that the proposed method achieves excellent cross-dataset performance and surpasses most of the existing state-of-the-art approaches, with average MAEs of 4.78∘ on Pointing\'04, 6.83∘ on BIWI, 7.05∘ on AFLW2000, 5.47∘ on Multi-PIE, and 5.06∘ on Pandora, although the model of the proposed method is not trained on any of these five datasets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号