关键词: CNN Polyp Segmentation Video

来  源:   DOI:10.1007/s11548-024-03244-6

Abstract:
OBJECTIVE: Commonly employed in polyp segmentation, single-image UNet architectures lack the temporal insight clinicians gain from video data in diagnosing polyps. To mirror clinical practices more faithfully, our proposed solution, PolypNextLSTM, leverages video-based deep learning, harnessing temporal information for superior segmentation performance with least parameter overhead, making it possibly suitable for edge devices.
METHODS: PolypNextLSTM employs a UNet-like structure with ConvNext-Tiny as its backbone, strategically omitting the last two layers to reduce parameter overhead. Our temporal fusion module, a Convolutional Long Short Term Memory (ConvLSTM), effectively exploits temporal features. Our primary novelty lies in PolypNextLSTM, which stands out as the leanest in parameters and the fastest model, surpassing the performance of five state-of-the-art image and video-based deep learning models. The evaluation of the SUN-SEG dataset spans easy-to-detect and hard-to-detect polyp scenarios, along with videos containing challenging artefacts like fast motion and occlusion.
RESULTS: Comparison against 5 image-based and 5 video-based models demonstrates PolypNextLSTM\'s superiority, achieving a Dice score of 0.7898 on the hard-to-detect polyp test set, surpassing image-based PraNet (0.7519) and video-based PNS+ (0.7486). Notably, our model excels in videos featuring complex artefacts such as ghosting and occlusion.
CONCLUSIONS: PolypNextLSTM, integrating pruned ConvNext-Tiny with ConvLSTM for temporal fusion, not only exhibits superior segmentation performance but also maintains the highest frames per speed among evaluated models. Code can be found here: https://github.com/mtec-tuhh/PolypNextLSTM .
摘要:
目的:通常用于息肉分割,单图像UNet架构缺乏临床医生在诊断息肉时从视频数据中获得的时间洞察力。为了更忠实地反映临床实践,我们提出的解决方案,PolypNextLSTM,利用基于视频的深度学习,利用时间信息以最小的参数开销实现卓越的分割性能,使其可能适用于边缘设备。
方法:PolypNextLSTM采用类似UNet的结构,以ConvNext-Tiny为骨架,策略性地省略最后两层以减少参数开销。我们的时间融合模块,卷积长短期记忆(ConvLSTM),有效地利用时间特征。我们的主要新颖性在于PolypNextLSTM,它是最精简的参数和最快的模型,超越了五种最先进的基于图像和视频的深度学习模型的性能。SUN-SEG数据集的评估涵盖了易于检测和难以检测的息肉场景,以及包含具有挑战性的人工制品的视频,如快速运动和遮挡。
结果:与5种基于图像和5种基于视频的模型的比较证明了PolypNextLSTM的优越性,在难以检测的息肉测试集上获得0.7898的骰子得分,超越基于图像的PraNet(0.7519)和基于视频的PNS+(0.7486)。值得注意的是,我们的模型擅长视频具有复杂的伪影,如重影和遮挡。
结论:PolypNextLSTM,将修剪的ConvNext-Tiny与ConvLSTM集成,用于时间融合,不仅表现出卓越的分割性能,而且在评估的模型中保持最高的帧/速度。代码可以在这里找到:https://github.com/mtec-tuhh/PolypNextLSTM。
公众号