关键词: 3D skeleton action recognition attention spatial-temporal transformer

Mesh : Human Activities Humans Motion Neural Networks, Computer Skeleton

来  源:   DOI:10.3390/s21165339   PDF(Pubmed)

Abstract:
Skeleton-based human action recognition has made great progress, especially with the development of a graph convolution network (GCN). The most important work is ST-GCN, which automatically learns both spatial and temporal patterns from skeleton sequences. However, this method still has some imperfections: only short-range correlations are appreciated, due to the limited receptive field of graph convolution. However, long-range dependence is essential for recognizing human action. In this work, we propose the use of a spatial-temporal relative transformer (ST-RT) to overcome these defects. Through introducing relay nodes, ST-RT avoids the transformer architecture, breaking the inherent skeleton topology in spatial and the order of skeleton sequence in temporal dimensions. Furthermore, we mine the dynamic information contained in motion at different scales. Finally, four ST-RTs, which extract spatial-temporal features from four kinds of skeleton sequence, are fused to form the final model, multi-stream spatial-temporal relative transformer (MSST-RT), to enhance performance. Extensive experiments evaluate the proposed methods on three benchmarks for skeleton-based action recognition: NTU RGB+D, NTU RGB+D 120 and UAV-Human. The results demonstrate that MSST-RT is on par with SOTA in terms of performance.
摘要:
基于骨架的人体动作识别已经取得了很大的进展,特别是随着图卷积网络(GCN)的发展。最重要的工作是ST-GCN,从骨架序列中自动学习空间和时间模式。然而,这种方法仍然存在一些缺陷:只有短程相关性得到重视,由于图卷积的接受场有限。然而,长期依赖对于识别人类行为至关重要。在这项工作中,我们建议使用时空相对变换器(ST-RT)来克服这些缺陷。通过引入中继节点,ST-RT避免了变压器架构,打破了空间上固有的骨架拓扑和时间维度上骨架序列的顺序。此外,我们挖掘运动中包含的不同尺度的动态信息。最后,四个ST-RT,从四种骨架序列中提取时空特征,融合形成最终模型,多流时空相对变换器(MSST-RT),以提高性能。广泛的实验在基于骨架的动作识别的三个基准上评估了所提出的方法:NTURGBD,NTURGB+D120和UAV-Human。结果表明,MSST-RT在性能方面与SOTA相当。
公众号