关键词: 3D face reconstruction computer vision facial keypoints matching head pose estimation

来  源:   DOI:10.3390/s21051841   PDF(Sci-hub)   PDF(Pubmed)

Abstract:
Mainstream methods treat head pose estimation as a supervised classification/regression problem, whose performance heavily depends on the accuracy of ground-truth labels of training data. However, it is rather difficult to obtain accurate head pose labels in practice, due to the lack of effective equipment and reasonable approaches for head pose labeling. In this paper, we propose a method which does not need to be trained with head pose labels, but matches the keypoints between a reconstructed 3D face model and the 2D input image, for head pose estimation. The proposed head pose estimation method consists of two components: the 3D face reconstruction and the 3D-2D matching keypoints. At the 3D face reconstruction phase, a personalized 3D face model is reconstructed from the input head image using convolutional neural networks, which are jointly optimized by an asymmetric Euclidean loss and a keypoint loss. At the 3D-2D keypoints matching phase, an iterative optimization algorithm is proposed to match the keypoints between the reconstructed 3D face model and the 2D input image efficiently under the constraint of perspective transformation. The proposed method is extensively evaluated on five widely used head pose estimation datasets, including Pointing\'04, BIWI, AFLW2000, Multi-PIE, and Pandora. The experimental results demonstrate that the proposed method achieves excellent cross-dataset performance and surpasses most of the existing state-of-the-art approaches, with average MAEs of 4.78∘ on Pointing\'04, 6.83∘ on BIWI, 7.05∘ on AFLW2000, 5.47∘ on Multi-PIE, and 5.06∘ on Pandora, although the model of the proposed method is not trained on any of these five datasets.
摘要:
主流方法将头部姿势估计视为监督分类/回归问题,其性能在很大程度上取决于训练数据的地面实况标签的准确性。然而,在实践中很难获得准确的头部姿势标签,由于缺乏有效的设备和合理的头部姿势标签方法。在本文中,我们提出了一种不需要用头部姿势标签训练的方法,但匹配重建的3D人脸模型和2D输入图像之间的关键点,用于头部姿势估计。所提出的头部姿态估计方法由两个部分组成:3D人脸重建和3D-2D匹配关键点。在3D人脸重建阶段,使用卷积神经网络从输入的头部图像重建个性化的3D人脸模型,通过非对称欧几里德损失和关键点损失共同优化。在3D-2D关键点匹配阶段,提出了一种迭代优化算法,在透视变换的约束下,有效地匹配重建的三维人脸模型和二维输入图像之间的关键点。所提出的方法在五个广泛使用的头部姿势估计数据集上进行了广泛的评估,包括指向\'04,BIWI,AFLW2000,多PIE,潘多拉。实验结果表明,该方法具有良好的跨数据集性能,超越了现有的大多数最先进的方法,指向\'04的平均MAE为4.78〇,BIWI的平均MAE为6.83〇,AFLW2000上的7.05○,Multi-PIE上的5.47○,和潘多拉5.06○,尽管所提出的方法的模型没有在这五个数据集中的任何一个上进行训练。
公众号