关键词: PointNet ResNet head pose estimation multi-source feature fusion point cloud data

来  源:   DOI:10.3390/s23249894   PDF(Pubmed)

Abstract:
Head pose estimation serves various applications, such as gaze estimation, fatigue-driven detection, and virtual reality. Nonetheless, achieving precise and efficient predictions remains challenging owing to the reliance on singular data sources. Therefore, this study introduces a technique involving multimodal feature fusion to elevate head pose estimation accuracy. The proposed method amalgamates data derived from diverse sources, including RGB and depth images, to construct a comprehensive three-dimensional representation of the head, commonly referred to as a point cloud. The noteworthy innovations of this method encompass a residual multilayer perceptron structure within PointNet, designed to tackle gradient-related challenges, along with spatial self-attention mechanisms aimed at noise reduction. The enhanced PointNet and ResNet networks are utilized to extract features from both point clouds and images. These extracted features undergo fusion. Furthermore, the incorporation of a scoring module strengthens robustness, particularly in scenarios involving facial occlusion. This is achieved by preserving features from the highest-scoring point cloud. Additionally, a prediction module is employed, combining classification and regression methodologies to accurately estimate head poses. The proposed method improves the accuracy and robustness of head pose estimation, especially in cases involving facial obstructions. These advancements are substantiated by experiments conducted using the BIWI dataset, demonstrating the superiority of this method over existing techniques.
摘要:
头部姿势估计服务于各种应用,比如凝视估计,疲劳驱动检测,和虚拟现实。尽管如此,由于对单一数据源的依赖,实现精确和有效的预测仍然具有挑战性。因此,这项研究介绍了一种涉及多模态特征融合的技术,以提高头部姿态估计的准确性。所提出的方法合并了来自不同来源的数据,包括RGB和深度图像,为了构建头部的全面三维表示,通常称为点云。该方法值得注意的创新包括PointNet中的剩余多层感知器结构,旨在应对与梯度相关的挑战,以及旨在降低噪声的空间自我注意机制。增强的PointNet和ResNet网络用于从点云和图像中提取特征。这些提取的特征经历融合。此外,评分模块的合并增强了鲁棒性,特别是在涉及面部遮挡的场景中。这是通过保留得分最高的点云中的特征来实现的。此外,采用预测模块,结合分类和回归方法来准确估计头部姿势。该方法提高了头部姿态估计的准确性和鲁棒性,尤其是涉及面部阻塞的病例。这些进步通过使用BIWI数据集进行的实验得到了证实,证明了该方法相对于现有技术的优越性。
公众号