关键词: instance segmentation semisupervised learning transformers video processing

来  源:   DOI:10.3390/s24030997   PDF(Pubmed)

Abstract:
A novel approach for video instance segmentation is presented using semisupervised learning. Our Cluster2Former model leverages scribble-based annotations for training, significantly reducing the need for comprehensive pixel-level masks. We augment a video instance segmenter, for example, the Mask2Former architecture, with similarity-based constraint loss to handle partial annotations efficiently. We demonstrate that despite using lightweight annotations (using only 0.5% of the annotated pixels), Cluster2Former achieves competitive performance on standard benchmarks. The approach offers a cost-effective and computationally efficient solution for video instance segmentation, especially in scenarios with limited annotation resources.
摘要:
提出了一种基于半监督学习的视频实例分割新方法。我们的Cluster2Former模型利用基于涂鸦的注释进行训练,显着减少对全面的像素级掩模的需要。我们增加了一个视频实例分割器,例如,Mask2Former建筑,具有基于相似性的约束损失,可以有效地处理部分注释。我们证明,尽管使用了轻量级注释(仅使用注释像素的0.5%),Cluster2Former在标准基准上实现了竞争性能。该方法为视频实例分割提供了一种具有成本效益和计算效率的解决方案,特别是在注释资源有限的场景中。
公众号