关键词: abdominal multi-organs attention mechanism cardiac substructures semi-supervised learning unpaired multi-modal learning

Mesh : Algorithms Heart Supervised Machine Learning Image Processing, Computer-Assisted

来  源:   DOI:10.1002/mp.16338

Abstract:
BACKGROUND: Multi-modal learning is widely adopted to learn the latent complementary information between different modalities in multi-modal medical image segmentation tasks. Nevertheless, the traditional multi-modal learning methods require spatially well-aligned and paired multi-modal images for supervised training, which cannot leverage unpaired multi-modal images with spatial misalignment and modality discrepancy. For training accurate multi-modal segmentation networks using easily accessible and low-cost unpaired multi-modal images in clinical practice, unpaired multi-modal learning has received comprehensive attention recently.
OBJECTIVE: Existing unpaired multi-modal learning methods usually focus on the intensity distribution gap but ignore the scale variation problem between different modalities. Besides, within existing methods, shared convolutional kernels are frequently employed to capture common patterns in all modalities, but they are typically inefficient at learning global contextual information. On the other hand, existing methods highly rely on a large number of labeled unpaired multi-modal scans for training, which ignores the practical scenario when labeled data is limited. To solve the above problems, we propose a modality-collaborative convolution and transformer hybrid network (MCTHNet) using semi-supervised learning for unpaired multi-modal segmentation with limited annotations, which not only collaboratively learns modality-specific and modality-invariant representations, but also could automatically leverage extensive unlabeled scans for improving performance.
METHODS: We make three main contributions to the proposed method. First, to alleviate the intensity distribution gap and scale variation problems across modalities, we develop a modality-specific scale-aware convolution (MSSC) module that can adaptively adjust the receptive field sizes and feature normalization parameters according to the input. Secondly, we propose a modality-invariant vision transformer (MIViT) module as the shared bottleneck layer for all modalities, which implicitly incorporates convolution-like local operations with the global processing of transformers for learning generalizable modality-invariant representations. Third, we design a multi-modal cross pseudo supervision (MCPS) method for semi-supervised learning, which enforces the consistency between the pseudo segmentation maps generated by two perturbed networks to acquire abundant annotation information from unlabeled unpaired multi-modal scans.
RESULTS: Extensive experiments are performed on two unpaired CT and MR segmentation datasets, including a cardiac substructure dataset derived from the MMWHS-2017 dataset and an abdominal multi-organ dataset consisting of the BTCV and CHAOS datasets. Experiment results show that our proposed method significantly outperforms other existing state-of-the-art methods under various labeling ratios, and achieves a comparable segmentation performance close to single-modal methods with fully labeled data by only leveraging a small portion of labeled data. Specifically, when the labeling ratio is 25%, our proposed method achieves overall mean DSC values of 78.56% and 76.18% in cardiac and abdominal segmentation, respectively, which significantly improves the average DSC value of two tasks by 12.84% compared to single-modal U-Net models.
CONCLUSIONS: Our proposed method is beneficial for reducing the annotation burden of unpaired multi-modal medical images in clinical applications.
摘要:
背景:多模态学习被广泛用于学习多模态医学图像分割任务中不同模态之间的潜在互补信息。然而,传统的多模态学习方法需要空间对齐和配对的多模态图像进行监督训练,这不能利用具有空间错位和模态差异的不成对的多模态图像。为了在临床实践中使用易于访问且低成本的不成对多模态图像来训练准确的多模态分割网络,不成对的多模态学习最近受到了广泛的关注。
目的:现有的不成对多模态学习方法通常侧重于强度分布间隙,而忽略了不同模态之间的尺度变化问题。此外,在现有方法中,共享卷积内核经常被用来捕获所有模态中的共同模式,但是他们通常在学习全球上下文信息方面效率低下。另一方面,现有方法高度依赖于大量标记的不成对多模态扫描进行训练,它忽略了标记数据有限时的实际场景。为了解决上述问题,我们提出了一个模态协作卷积和变压器混合网络(MCTHNet)使用半监督学习的不成对多模态分割与有限的注释,它不仅合作学习模态特定和模态不变的表示,但也可以自动利用广泛的无标签扫描来提高性能。
方法:我们对所提出的方法做出了三个主要贡献。首先,为了缓解不同模式的强度分布差距和尺度变化问题,我们开发了一个特定于模态的尺度感知卷积(MSSC)模块,该模块可以根据输入自适应地调整感受野大小和特征归一化参数。其次,我们提出了一个模态不变视觉转换器(MIVT)模块作为所有模态的共享瓶颈层,它隐式地将类似卷积的局部运算与转换器的全局处理相结合,以学习可概括的模态不变表示。第三,我们设计了一种用于半监督学习的多模态交叉伪监督(MCPS)方法,这加强了由两个扰动网络生成的伪分割图之间的一致性,以从未标记的不成对多模态扫描中获取丰富的注释信息。
结果:对两个不成对的CT和MR分割数据集进行了大量实验,包括从MMWHS-2017数据集导出的心脏子结构数据集和由BTCV和CHAOS数据集组成的腹部多器官数据集。实验结果表明,在各种标记比率下,我们提出的方法明显优于其他现有的最新方法,并通过仅利用一小部分标记数据来实现与具有完全标记数据的单模态方法相似的分割性能。具体来说,当标签比例为25%时,我们提出的方法在心脏和腹部分割中实现了78.56%和76.18%的总体平均DSC值,分别,与单模态U-Net模型相比,两个任务的平均DSC值显着提高了12.84%。
结论:我们提出的方法有利于减少临床应用中不成对的多模态医学图像的注释负担。
公众号