关键词: General movement assessment Infant pose estimation Multi-view videos Self-supervision

Mesh : Humans Video Recording Infant Imaging, Three-Dimensional / methods Posture / physiology Cerebral Palsy / diagnostic imaging physiopathology Algorithms Supervised Machine Learning

来  源:   DOI:10.1016/j.media.2024.103208

Abstract:
General movement and pose assessment of infants is crucial for the early detection of cerebral palsy (CP). Nevertheless, most human pose estimation methods, in 2D or 3D, focus on adults due to the lack of large datasets and pose annotations on infants. To solve these problems, here we present a model known as YOLO-infantPose, which has been fine-tuned, for infant pose estimation in 2D. We further propose a self-supervised model called STAPose3D for 3D infant pose estimation based on videos. We employ multi-view video data during the training process as a strategy to address the challenge posed by the absence of 3D pose annotations. STAPose3D combines temporal convolution, temporal attention, and graph attention to jointly learn spatio-temporal features of infant pose. Our methods are summarized into two stages: applying YOLO-infantPose on input videos, followed by lifting these 2D poses along with respective confidences for every joint to 3D. The employment of the best-performing 2D detector in the first stage significantly improves the precision of 3D pose estimation. We reveal that fine-tuned YOLO-infantPose outperforms other models tested on our clinical dataset as well as two public datasets MINI-RGBD and YouTube-Infant dataset. Results from our infant movement video dataset demonstrate that STAPose3D effectively comprehends the spatio-temporal features among different views and significantly improves the performance of 3D infant pose estimation in videos. Finally, we explore the clinical application of our method for general movement assessment (GMA) in a clinical dataset annotated as normal writhing movements or abnormal monotonic movements according to the GMA standards. We show that the 3D pose estimation results produced by our STAPose3D model significantly boost the GMA prediction performance than 2D pose estimation. Our code is available at github.com/wwYinYin/STAPose3D.
摘要:
婴儿的全身运动和姿势评估对于早期发现脑瘫(CP)至关重要。然而,大多数人类姿态估计方法,在2D或3D中,由于缺乏大型数据集和对婴儿的姿势注释,重点关注成人。为了解决这些问题,在这里,我们提出了一个被称为YOLO-infantPose的模型,经过微调,用于2D中的婴儿姿势估计。我们进一步提出了一种称为STAPose3D的自监督模型,用于基于视频的3D婴儿姿势估计。我们在训练过程中采用多视图视频数据作为一种策略,以解决缺少3D姿势注释所带来的挑战。STAPose3D结合了时间卷积,时间注意力,和图形注意共同学习婴儿姿势的时空特征。我们的方法分为两个阶段:在输入视频上应用YOLO-infantPose,然后将这些2D姿势以及每个关节的各自置信度提升到3D。在第一阶段中使用性能最佳的2D检测器显著提高了3D姿态估计的精度。我们发现,经过微调的YOLO-infantPose优于在我们的临床数据集以及两个公共数据集MINI-RGBD和YouTube-Infant数据集上测试的其他模型。来自我们的婴儿运动视频数据集的结果表明,STAPose3D有效地理解了不同视图之间的时空特征,并显着提高了视频中3D婴儿姿势估计的性能。最后,我们在根据GMA标准注释为正常扭体运动或异常单调运动的临床数据集中,探索我们的全身运动评估(GMA)方法的临床应用.我们表明,由我们的STAPose3D模型产生的3D姿态估计结果比2D姿态估计显着提高了GMA预测性能。我们的代码可在github.com/wwYinyin/STAPose3D获得。
公众号