关键词: 3D hand mesh estimation 3D hand pose estimation GCN Graphformer HandGCNFormer Transformer

来  源:   DOI:10.3389/fnbot.2024.1395652   PDF(Pubmed)

Abstract:
In Human-Robot Interaction (HRI), accurate 3D hand pose and mesh estimation hold critical importance. However, inferring reasonable and accurate poses in severe self-occlusion and high self-similarity remains an inherent challenge. In order to alleviate the ambiguity caused by invisible and similar joints during HRI, we propose a new Topology-aware Transformer network named HandGCNFormer with depth image as input, incorporating prior knowledge of hand kinematic topology into the network while modeling long-range contextual information. Specifically, we propose a novel Graphformer decoder with an additional Node-offset Graph Convolutional layer (NoffGConv). The Graphformer decoder optimizes the synergy between the Transformer and GCN, capturing long-range dependencies and local topological connections between joints. On top of that, we replace the standard MLP prediction head with a novel Topology-aware head to better exploit local topological constraints for more reasonable and accurate poses. Our method achieves state-of-the-art 3D hand pose estimation performance on four challenging datasets, including Hands2017, NYU, ICVL, and MSRA. To further demonstrate the effectiveness and scalability of our proposed Graphformer Decoder and Topology aware head, we extend our framework to HandGCNFormer-Mesh for the 3D hand mesh estimation task. The extended framework efficiently integrates a shape regressor with the original Graphformer Decoder and Topology aware head, producing Mano parameters. The results on the HO-3D dataset, which contains various and challenging occlusions, show that our HandGCNFormer-Mesh achieves competitive results compared to previous state-of-the-art 3D hand mesh estimation methods.
摘要:
在人机交互(HRI)中,准确的3D手部姿势和网格估计具有至关重要的意义。然而,在严重的自遮挡和高自相似性中推断合理和准确的姿势仍然是一个固有的挑战。为了缓解HRI期间由不可见和相似关节引起的模糊性,我们提出了一种新的拓扑感知变压器网络,名为HandGCNFormer,以深度图像为输入,在对远程上下文信息进行建模的同时,将手运动学拓扑的先验知识纳入网络。具体来说,我们提出了一种新颖的图形形成器解码器,该解码器具有附加的节点偏移图形卷积层(NoffGConv)。Graphformer解码器优化了Transformer和GCN之间的协同作用,捕获关节之间的远程依赖关系和局部拓扑连接。最重要的是,我们用新颖的拓扑感知头替换标准MLP预测头,以更好地利用局部拓扑约束来实现更合理和准确的姿势。我们的方法在四个具有挑战性的数据集上实现了最先进的3D手部姿势估计性能,包括Hands2017,NYU,ICVL,MSRA。为了进一步证明我们提出的Graphformer解码器和拓扑感知头的有效性和可扩展性,我们将我们的框架扩展到HandGCNFormer-Mesh,用于3D手网格估计任务。扩展框架有效地将形状回归量与原始的Graphformer解码器和拓扑感知头集成在一起,生产马诺参数。HO-3D数据集上的结果,其中包含各种具有挑战性的遮挡,表明,与以前最先进的3D手网格估计方法相比,我们的HandGCNFormer-Mesh取得了有竞争力的结果。
公众号