关键词: Dual-scale shifted window attention Medical image segmentation Swin Transformer

来  源:   DOI:10.1038/s41598-024-68587-1   PDF(Pubmed)

Abstract:
Swin Transformer is an important work among all the attempts to reduce the computational complexity of Transformers while maintaining its excellent performance in computer vision. Window-based patch self-attention can use the local connectivity of the image features, and the shifted window-based patch self-attention enables the communication of information between different patches in the entire image scope. Through in-depth research on the effects of different sizes of shifted windows on the patch information communication efficiency, this article proposes a Dual-Scale Transformer with double-sized shifted window attention method. The proposed method surpasses CNN-based methods such as U-Net, AttenU-Net, ResU-Net, CE-Net by a considerable margin (Approximately 3% ∼ 6% increase), and outperforms the Transformer based models single-scale Swin Transformer(SwinT)(Approximately 1% increase), on the datasets of the Kvasir-SEG, ISIC2017, MICCAI EndoVisSub-Instrument and CadVesSet. The experimental results verify that the proposed dual scale shifted window attention benefits the communication of patch information and can enhance the segmentation results to state of the art. We also implement an ablation study on the effect of the shifted window size on the information flow efficiency and verify that the dual-scale shifted window attention is the optimized network design. Our study highlights the significant impact of network structure design on visual performance, providing valuable insights for the design of networks based on Transformer architectures.
摘要:
SwinTransformer是所有尝试中的一项重要工作,旨在降低变压器的计算复杂度,同时保持其在计算机视觉中的出色性能。基于窗口的补丁自注意可以使用图像特征的本地连接,和移位的基于窗口的补丁自注意使得能够在整个图像范围内的不同补丁之间进行信息的通信。通过深入研究不同移位窗口大小对贴片信息传播效率的影响,本文提出了一种双尺度变压器双尺寸移位窗口注意方法。所提出的方法超越了基于CNN的方法,如U-Net,AttenU-Net,ResU-Net,CE-Net大幅增长(约3%~6%增长),并且优于基于变压器的模型单尺度双变压器(SwinT)(大约增加1%),在Kvasir-SEG的数据集上,ISIC2017,MICCAIEndoVisSub仪器和CadVesSet。实验结果验证了所提出的双尺度移位窗口注意力有利于补丁信息的交流,并且可以将分割结果增强到最先进的水平。我们还对移位窗口大小对信息流效率的影响进行了消融研究,并验证了双尺度移位窗口注意力是优化的网络设计。我们的研究强调了网络结构设计对视觉性能的重大影响,为基于变压器体系结构的网络设计提供有价值的见解。
公众号