关键词: capsule network global attention block head pose estimation self-attention routing

来  源:   DOI:10.3390/e25071024   PDF(Pubmed)

Abstract:
Head pose estimation is an important technology for analyzing human behavior and has been widely researched and applied in areas such as human-computer interaction and fatigue detection. However, traditional head pose estimation networks suffer from the problem of easily losing spatial structure information, particularly in complex scenarios where occlusions and multiple object detections are common, resulting in low accuracy. To address the above issues, we propose a head pose estimation model based on the residual network and capsule network. Firstly, a deep residual network is used to extract features from three stages, capturing spatial structure information at different levels, and a global attention block is employed to enhance the spatial weight of feature extraction. To effectively avoid the loss of spatial structure information, the features are encoded and transmitted to the output using an improved capsule network, which is enhanced in its generalization ability through self-attention routing mechanisms. To enhance the robustness of the model, we optimize Huber loss, which is first used in head pose estimation. Finally, experiments are conducted on three popular public datasets, 300W-LP, AFLW2000, and BIWI. The results demonstrate that the proposed method achieves state-of-the-art results, particularly in scenarios with occlusions.
摘要:
头部姿态估计是分析人体行为的重要技术,在人机交互、疲劳检测等领域得到了广泛的研究和应用。然而,传统的头部姿态估计网络容易丢失空间结构信息,特别是在遮挡和多对象检测常见的复杂场景中,导致精度低。为了解决上述问题,提出了一种基于残差网络和胶囊网络的头部姿态估计模型。首先,深度残差网络用于从三个阶段提取特征,捕获不同层次的空间结构信息,并采用全局注意块来增强特征提取的空间权重。为有效避免空间结构信息的丢失,使用改进的胶囊网络对特征进行编码并传输到输出,通过自注意路由机制增强了其泛化能力。为了增强模型的鲁棒性,我们优化Huber损失,它首先用于头部姿势估计。最后,实验是在三个流行的公共数据集上进行的,300W-LP,AFLW2000和BIWI。结果表明,所提出的方法达到了最先进的结果,特别是在有遮挡的情况下。
公众号