TransUNet ：通过变压器镜头重新思考用于医学图像分割的 U - Net 架构设计。TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

Medical image segmentation is crucial for healthcare, yet convolution-based methods like U-Net face limitations in modeling long-range dependencies. To address this, Transformers designed for sequence-to-sequence predictions have been integrated into medical image segmentation. However, a comprehensive understanding of Transformers\' self-attention in U-Net components is lacking. TransUNet, first introduced in 2021, is widely recognized as one of the first models to integrate Transformer into medical image analysis. In this study, we present the versatile framework of TransUNet that encapsulates Transformers\' self-attention into two key modules: (1) a Transformer encoder tokenizing image patches from a convolution neural network (CNN) feature map, facilitating global context extraction, and (2) a Transformer decoder refining candidate regions through cross-attention between proposals and U-Net features. These modules can be flexibly inserted into the U-Net backbone, resulting in three configurations: Encoder-only, Decoder-only, and Encoder+Decoder. TransUNet provides a library encompassing both 2D and 3D implementations, enabling users to easily tailor the chosen architecture. Our findings highlight the encoder\'s efficacy in modeling interactions among multiple abdominal organs and the decoder\'s strength in handling small targets like tumors. It excels in diverse medical applications, such as multi-organ segmentation, pancreatic tumor segmentation, and hepatic vessel segmentation. Notably, our TransUNet achieves a significant average Dice improvement of 1.06% and 4.30% for multi-organ segmentation and pancreatic tumor segmentation, respectively, when compared to the highly competitive nn-UNet, and surpasses the top-1 solution in the BrasTS2021 challenge. 2D/3D Code and models are available at https://github.com/Beckschen/TransUNet and https://github.com/Beckschen/TransUNet-3D, respectively.

摘要：

医学图像分割对医疗保健至关重要，然而，基于卷积的方法，如U-Net在建模远程依赖关系方面面临限制。为了解决这个问题，为序列到序列预测而设计的变压器已集成到医学图像分割中。然而,缺乏对U-Net组件中的变形金刚自我注意的全面理解。TransUNet,2021年首次推出，被广泛认为是首批将Transformer集成到医学图像分析中的模型之一。在这项研究中,我们介绍了TransUNet的通用框架，该框架将变形金刚的自我注意力封装到两个关键模块中：（1）Transformer编码器从卷积神经网络（CNN）特征图中标记图像块，促进全局上下文提取，和(2)变换器解码器通过提议和U-Net特征之间的交叉注意来细化候选区域。这些模块可以灵活地插入到U-Net骨干，导致三种配置：仅编码器，仅解码器，和编码器+解码器。TransUNet提供了一个包含2D和3D实现的库，使用户能够轻松定制所选择的体系结构。我们的发现强调了编码器在模拟多个腹部器官之间的相互作用方面的功效，以及解码器在处理肿瘤等小目标方面的优势。它擅长各种医疗应用，比如多器官分割，胰腺肿瘤分割，和肝血管分割。值得注意的是,我们的TransUNet在多器官分割和胰腺肿瘤分割方面实现了1.06%和4.30%的平均Dice改善，分别,与竞争激烈的NN-UNet相比，并超越BrasTS2021挑战中的Top-1解决方案。2D/3D代码和模型可在https://github.com/Beckschen/TransUNet和https://github.com/Beckschen/TransUNet-3D获得，分别。