medical image segmentation

医学图像分割
  • 文章类型: Journal Article
    肺癌是全球癌症相关死亡的主要原因。为了准确诊断和治疗,需要对医学图像进行精确的肿瘤分割。然而,肿瘤形态学的内在复杂性和变异性对分割任务提出了重大挑战。为了解决这个问题,我们提出了一种具有师生框架的多任务连接U-Net模型,以提高肺肿瘤分割的有效性。所提出的模型和框架将PET知识集成到分割过程中,利用来自CT和PET模式的补充信息来提高分割性能。此外,我们实施了一种肿瘤区域检测方法来增强肿瘤分割性能。在四个数据集的广泛实验中,使用我们的模型获得的平均骰子系数为0.56,超过了现有方法,如Segformer(0.51),变压器(0.50),和UctransNet(0.43)。这些发现验证了所提出的方法在肺肿瘤分割任务中的有效性。
    Lung cancer is a predominant cause of cancer-related mortality worldwide, necessitating precise tumor segmentation of medical images for accurate diagnosis and treatment. However, the intrinsic complexity and variability of tumor morphology pose substantial challenges to segmentation tasks. To address this issue, we propose a multitask connected U-Net model with a teacher-student framework to enhance the effectiveness of lung tumor segmentation. The proposed model and framework integrate PET knowledge into the segmentation process, leveraging complementary information from both CT and PET modalities to improve segmentation performance. Additionally, we implemented a tumor area detection method to enhance tumor segmentation performance. In extensive experiments on four datasets, the average Dice coefficient of 0.56, obtained using our model, surpassed those of existing methods such as Segformer (0.51), Transformer (0.50), and UctransNet (0.43). These findings validate the efficacy of the proposed method in lung tumor segmentation tasks.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    部分监督的多器官医学图像分割旨在通过利用多个部分标记的数据集开发统一的语义分割模型,每个数据集都为一类器官提供标签。然而,标记的前景器官的有限可用性以及缺乏将未标记的前景器官与背景区分开的监督构成了重大挑战,这导致标记像素和未标记像素之间的分布不匹配。尽管可以采用现有的伪标记方法来从标记和未标记的像素中学习,他们在这项任务中容易性能下降,因为它们依赖于标记和未标记像素具有相同分布的假设。在本文中,为了解决分配不匹配的问题,我们提出了一个标记到未标记的分布对齐(LTUDA)框架,该框架可以对齐特征分布并增强区分能力。具体来说,我们引入了一种跨集数据增强策略,它在标记和未标记的器官之间执行区域级混合,以减少分布差异并丰富训练集。此外,我们提出了一种基于原型的分布对齐方法,该方法隐含地减少了类内变化,并增加了未标记前景和背景之间的分离。这可以通过鼓励两个原型分类器和线性分类器的输出之间的一致性来实现。在AbdomenCT-1K数据集和四个基准数据集(包括LiTS,MSD-脾,KTS,和NIH82)证明我们的方法在相当大的程度上优于最先进的部分监督方法,甚至超过了完全监督的方法。源代码可在LTUDA公开获得。
    Partially-supervised multi-organ medical image segmentation aims to develop a unified semantic segmentation model by utilizing multiple partially-labeled datasets, with each dataset providing labels for a single class of organs. However, the limited availability of labeled foreground organs and the absence of supervision to distinguish unlabeled foreground organs from the background pose a significant challenge, which leads to a distribution mismatch between labeled and unlabeled pixels. Although existing pseudo-labeling methods can be employed to learn from both labeled and unlabeled pixels, they are prone to performance degradation in this task, as they rely on the assumption that labeled and unlabeled pixels have the same distribution. In this paper, to address the problem of distribution mismatch, we propose a labeled-to-unlabeled distribution alignment (LTUDA) framework that aligns feature distributions and enhances discriminative capability. Specifically, we introduce a cross-set data augmentation strategy, which performs region-level mixing between labeled and unlabeled organs to reduce distribution discrepancy and enrich the training set. Besides, we propose a prototype-based distribution alignment method that implicitly reduces intra-class variation and increases the separation between the unlabeled foreground and background. This can be achieved by encouraging consistency between the outputs of two prototype classifiers and a linear classifier. Extensive experimental results on the AbdomenCT-1K dataset and a union of four benchmark datasets (including LiTS, MSD-Spleen, KiTS, and NIH82) demonstrate that our method outperforms the state-of-the-art partially-supervised methods by a considerable margin, and even surpasses the fully-supervised methods. The source code is publicly available at LTUDA.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    有限的数据对基于深度学习的体积医学图像分割提出了至关重要的挑战,并且许多方法都试图通过其子体积来表示体积(即,多视图切片)来缓解这个问题。然而,这样的方法通常牺牲切片间空间连续性。目前,一个有前途的途径涉及将多视图信息纳入网络以增强卷表示学习,但是大多数现有的研究倾向于忽略不同观点之间的差异和依赖性,最终限制了多视图表示的潜力。为此,我们提出了一个交叉视图差异依赖网络(CvDd-Net)来完成体积医学图像分割,它在帮助卷表示学习之前利用多视图切片,并探索视图差异和视图依赖性以提高性能。具体来说,我们开发了一个差异感知的形态强化(DaMR)模块,通过挖掘形态信息来有效地学习视图特定的表示(即,物体的边界和位置)。此外,我们设计了一个依赖感知的信息聚合(DaIA)模块来充分利用多视图切片之前,增强卷的单个视图表示,并基于跨视图依赖性集成它们。在四个医学图像数据集上进行广泛的实验(即,甲状腺,子宫颈,胰腺,和Glioma)证明了所提出的方法在完全监督和半监督任务上的有效性。
    The limited data poses a crucial challenge for deep learning-based volumetric medical image segmentation, and many methods have tried to represent the volume by its subvolumes (i.e., multi-view slices) for alleviating this issue. However, such methods generally sacrifice inter-slice spatial continuity. Currently, a promising avenue involves incorporating multi-view information into the network to enhance volume representation learning, but most existing studies tend to overlook the discrepancy and dependency across different views, ultimately limiting the potential of multi-view representations. To this end, we propose a cross-view discrepancy-dependency network (CvDd-Net) to task with volumetric medical image segmentation, which exploits multi-view slice prior to assist volume representation learning and explore view discrepancy and view dependency for performance improvement. Specifically, we develop a discrepancy-aware morphology reinforcement (DaMR) module to effectively learn view-specific representation by mining morphological information (i.e., boundary and position of object). Besides, we design a dependency-aware information aggregation (DaIA) module to adequately harness the multi-view slice prior, enhancing individual view representations of the volume and integrating them based on cross-view dependency. Extensive experiments on four medical image datasets (i.e., Thyroid, Cervix, Pancreas, and Glioma) demonstrate the efficacy of the proposed method on both fully-supervised and semi-supervised tasks.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:锥形束计算机断层扫描(CBCT)图像分割在前列腺癌放射治疗中至关重要,使前列腺的精确描绘准确的治疗计划和交付。然而,CBCT图像质量差给临床实践带来挑战,由于图像噪声等因素,使得注释变得困难,低对比度,和器官变形。
    目的:本研究的目的是为无标签目标域(CBCT)创建一个分割模型,利用来自标签丰富的源域(CT)的有价值的见解。通过实现跨模态医学图像分割框架来解决跨不同领域的领域差距,从而实现了这一目标。
    方法:我们的方法介绍了一种多尺度域自适应分割方法,同时在图像和特征级别执行域自适应。主要创新在于一种新颖的多尺度解剖正则化方法,其中(i)在多个空间尺度上同时将目标域特征空间与源域特征空间对齐,(ii)跨不同尺度交换信息,从多尺度角度融合知识。
    结果:对骨盆CBCT分割任务进行了定量和定性实验。训练数据集包括40个未配对的CBCT-CT图像,其中仅注释了CT图像。验证和测试数据集包括5和10个CT图像,分别,所有的注释。实验结果表明,与其他最先进的跨模态医学图像分割方法相比,我们的方法具有出色的性能。CBCT图像分割结果的Dice相似系数(DSC)为74.6±9.3$74.6\\pm9.3$%,平均对称表面距离(ASSD)为3.9±1.8mm$3.9\\pm1.8\\;\\mathrm{mm}$。统计分析证实了通过我们的方法实现的改进的统计显著性。
    结论:与其他方法相比,我们的方法在骨盆CBCT图像分割方面具有优越性。
    BACKGROUND: Cone beam computed tomography (CBCT) image segmentation is crucial in prostate cancer radiotherapy, enabling precise delineation of the prostate gland for accurate treatment planning and delivery. However, the poor quality of CBCT images poses challenges in clinical practice, making annotation difficult due to factors such as image noise, low contrast, and organ deformation.
    OBJECTIVE: The objective of this study is to create a segmentation model for the label-free target domain (CBCT), leveraging valuable insights derived from the label-rich source domain (CT). This goal is achieved by addressing the domain gap across diverse domains through the implementation of a cross-modality medical image segmentation framework.
    METHODS: Our approach introduces a multi-scale domain adaptive segmentation method, performing domain adaptation simultaneously at both the image and feature levels. The primary innovation lies in a novel multi-scale anatomical regularization approach, which (i) aligns the target domain feature space with the source domain feature space at multiple spatial scales simultaneously, and (ii) exchanges information across different scales to fuse knowledge from multi-scale perspectives.
    RESULTS: Quantitative and qualitative experiments were conducted on pelvic CBCT segmentation tasks. The training dataset comprises 40 unpaired CBCT-CT images with only CT images annotated. The validation and testing datasets consist of 5 and 10 CT images, respectively, all with annotations. The experimental results demonstrate the superior performance of our method compared to other state-of-the-art cross-modality medical image segmentation methods. The Dice similarity coefficients (DSC) for CBCT image segmentation results is 74.6 ± 9.3 $74.6 \\pm 9.3$ %, and the average symmetric surface distance (ASSD) is 3.9 ± 1.8 mm $3.9\\pm 1.8\\;\\mathrm{mm}$ . Statistical analysis confirms the statistical significance of the improvements achieved by our method.
    CONCLUSIONS: Our method exhibits superiority in pelvic CBCT image segmentation compared to its counterparts.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    现有的医学图像分割方法可能只考虑空间域的特征提取和信息处理,或者缺乏频率信息和空间信息之间相互作用的设计,或者忽略浅层和深层特征之间的语义鸿沟,并导致分割结果不准确。因此,在本文中,我们提出了一种新颖的频率选择分段网络(FSSN),通过融合局部空间特征和全局频率信息实现更准确的病变分割,更好的功能交互设计,并抑制低相关频率分量,以减轻语义差距。首先,我们提出了一个全局-局部特征聚合模块(GLAM),以同时捕获空间域中的多尺度局部特征,并利用频域中的全局频率信息,实现局部细节特征和全局频率信息的互补融合。其次,我们提出了一个特征过滤器模块(FFM)来减轻语义差距,当我们进行跨级别的特征融合,并使FSSN有区别地确定应保留哪些频率信息以进行准确的病变分割。最后,为了更好地利用当地信息,尤其是病变区域的边界,我们采用可变形卷积(DC)来提取局部范围内的相关特征,并使我们的FSSN可以更好地专注于相关图像内容。在两个公共基准数据集上的大量实验表明,与代表性的医学图像分割方法相比,我们的FSSN能够以较少的参数和较低的计算复杂度,在客观评价指标和主观视觉效果方面获得更准确的病变分割结果.
    Existing medical image segmentation methods may only consider feature extraction and information processing in spatial domain, or lack the design of interaction between frequency information and spatial information, or ignore the semantic gaps between shallow and deep features, and lead to inaccurate segmentation results. Therefore, in this paper, we propose a novel frequency selection segmentation network (FSSN), which achieves more accurate lesion segmentation by fusing local spatial features and global frequency information, better design of feature interactions, and suppressing low correlation frequency components for mitigating semantic gaps. Firstly, we propose a global-local feature aggregation module (GLAM) to simultaneously capture multi-scale local features in the spatial domain and exploits global frequency information in the frequency domain, and achieves complementary fusion of local details features and global frequency information. Secondly, we propose a feature filter module (FFM) to mitigate semantic gaps when we conduct cross-level features fusion, and makes FSSN discriminatively determine which frequency information should be preserved for accurate lesion segmentation. Finally, in order to make better use of local information, especially the boundary of lesion region, we employ deformable convolution (DC) to extract pertinent features in the local range, and makes our FSSN can focus on relevant image contents better. Extensive experiments on two public benchmark datasets show that compared with representative medical image segmentation methods, our FSSN can obtain more accurate lesion segmentation results in terms of both objective evaluation indicators and subjective visual effects with fewer parameters and lower computational complexity.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基于深度学习的计算机断层扫描(CT)成像的快速目标分割方法已经变得越来越流行。当前深度学习方法的成功通常取决于大量的标记数据。给医疗数据贴标签是一项耗时且费力的任务。因此,本文旨在通过使用半监督学习方法来增强CT图像的分割。为了利用未标记数据中的有效信息,设计了基于熵约束的半监督网络对比学习模型。我们使用CNN和Transformer来捕获图像的局部和全局特征信息,分别。此外,教师网络生成的伪标签是不可靠的,如果直接添加到训练中,将导致模型性能下降。因此,具有高熵值的不可靠样本被丢弃以避免模型提取错误的特征。在学生网络中,我们还引入了残差挤压和激励模块来学习每一层特征的不同通道之间的联系,以获得更好的分割性能。我们在COVID-19CT公共数据集上证明了该方法的有效性。我们主要考虑了三个评价指标:DSC,HD95和JC。与现有的几种最先进的半监督方法相比,我们的方法将DSC提高了2.3%,JC下降2.5%,HD95减少1.9毫米。在本文中,融合CNN和Transformer,利用熵约束的对比学习损失,设计了一种半监督医学图像分割方法,这提高了未标记医学图像的利用率。
    Deep learning-based methods for fast target segmentation of computed tomography (CT) imaging have become increasingly popular. The success of current deep learning methods usually depends on a large amount of labeled data. Labeling medical data is a time-consuming and laborious task. Therefore, this paper aims to enhance the segmentation of CT images by using a semi-supervised learning method. In order to utilize the valid information in unlabeled data, we design a semi-supervised network model for contrastive learning based on entropy constraints. We use CNN and Transformer to capture the image\'s local and global feature information, respectively. In addition, the pseudo-labels generated by the teacher networks are unreliable and will lead to degradation of the model performance if they are directly added to the training. Therefore, unreliable samples with high entropy values are discarded to avoid the model extracting the wrong features. In the student network, we also introduce the residual squeeze and excitation module to learn the connection between different channels of each layer feature to obtain better segmentation performance. We demonstrate the effectiveness of the proposed method on the COVID-19 CT public dataset. We mainly considered three evaluation metrics: DSC, HD95, and JC. Compared with several existing state-of-the-art semi-supervised methods, our method improves DSC by 2.3%, JC by 2.5%, and reduces HD95 by 1.9 mm. In this paper, a semi-supervised medical image segmentation method is designed by fusing CNN and Transformer and utilizing entropy-constrained contrastive learning loss, which improves the utilization of unlabeled medical images.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    UNet架构在医学图像分割应用中取得了巨大的成功。然而,这些模型仍然遇到一些挑战。一个是由多个下采样步骤引起的像素级信息的丢失。此外,解码器中使用的加法或级联方法可以生成冗余信息。这些限制影响了本地化能力,削弱了不同层次特征的互补性,可能导致边界模糊。然而,差分特征可以有效地弥补这些缺点,显著提高图像分割的性能。因此,我们提出了基于UNet的MGRAD-UNet(多门控反向注意多尺度差分UNet)。我们利用多尺度差分解码器在像素级和结构级生成丰富的差分特征。这些功能作为门信号,被发送到门控制器并转发到另一个差分解码器。为了加强对重要区域的关注,另一个差分解码器配备了反向注意。对两个差分解码器获得的特征进行了第二次微分。得到的差分特征作为控制信号被发送回控制器,然后传输到编码器用于由两个差分解码器学习差分特征。MGRAD-UNet的核心设计在于通过缓存整体差分特征和多尺度差分处理,提取全面、准确的特征,从不同的信息中实现迭代学习。我们在两个公共数据集上针对最新的(SOTA)方法评估了MGRAD-UNet。我们的方法超越了竞争对手,为UNet的设计提供了新的方法。
    UNet architecture has achieved great success in medical image segmentation applications. However, these models still encounter several challenges. One is the loss of pixel-level information caused by multiple down-sampling steps. Additionally, the addition or concatenation method used in the decoder can generate redundant information. These limitations affect the localization ability, weaken the complementarity of features at different levels and can lead to blurred boundaries. However, differential features can effectively compensate for these shortcomings and significantly enhance the performance of image segmentation. Therefore, we propose MGRAD-UNet (multi-gated reverse attention multi-scale differential UNet) based on UNet. We utilize the multi-scale differential decoder to generate abundant differential features at both the pixel level and structure level. These features which serve as gate signals, are transmitted to the gate controller and forwarded to the other differential decoder. In order to enhance the focus on important regions, another differential decoder is equipped with reverse attention. The features obtained by two differential decoders are differentiated for the second time. The resulting differential feature obtained is sent back to the controller as a control signal, then transmitted to the encoder for learning the differential feature by two differential decoders. The core design of MGRAD-UNet lies in extracting comprehensive and accurate features through caching overall differential features and multi-scale differential processing, enabling iterative learning from diverse information. We evaluate MGRAD-UNet against state-of-theart (SOTA) methods on two public datasets. Our method surpasses competitors and provides a new approach for the design of UNet.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    计算机辅助诊断在口腔溃疡领域发展缓慢。造成这种情况的主要原因之一是缺乏公开可用的数据集。然而,口腔溃疡有癌性病变,死亡率高。在早期及时有效地识别口腔溃疡的能力是一个非常关键的问题。近年来,尽管有一小群研究人员在研究这些,数据集是私有的。因此,为了应对这一挑战,本文提出了一个包含两个主要任务的多任务的口腔溃疡数据集(Autooral),即病变分割和分类。据我们所知,我们是第一个通过多任务处理公开口腔溃疡数据集的团队.此外,我们提出了一个新的建模框架,HF-UNet,用于分割口腔溃疡病变区域。具体来说,所提出的高阶聚焦交互模块(HFblock)通过高阶注意力执行全局属性的获取和局部属性的获取。所提出的病变定位模块(LL-M)采用了一种新颖的混合Sobel滤波器,这提高了溃疡边缘的识别。在所提出的Autooral数据集上的实验结果表明,我们提出的口腔溃疡的HF-UNet分割实现了约0.80的DSC值,并且推理记忆仅占用2029MB。所提出的方法保证了较低的运行负载,同时保持了高性能的分段能力。建议的Autooral数据集和代码可从https://github.com/wurenkai/HF-UNet-and-Autooral-数据集获得。
    Computer-aided diagnosis has been slow to develop in the field of oral ulcers. One of the major reasons for this is the lack of publicly available datasets. However, oral ulcers have cancerous lesions and their mortality rate is high. The ability to recognize oral ulcers at an early stage in a timely and effective manner is a very critical issue. In recent years, although there exists a small group of researchers working on these, the datasets are private. Therefore to address this challenge, in this paper a multi-tasking oral ulcer dataset (Autooral) containing two major tasks of lesion segmentation and classification is proposed and made publicly available. To the best of our knowledge, we are the first team to make publicly available an oral ulcer dataset with multi-tasking. In addition, we propose a novel modeling framework, HF-UNet, for segmenting oral ulcer lesion regions. Specifically, the proposed high-order focus interaction module (HFblock) performs acquisition of global properties and focus for acquisition of local properties through high-order attention. The proposed lesion localization module (LL-M) employs a novel hybrid sobel filter, which improves the recognition of ulcer edges. Experimental results on the proposed Autooral dataset show that our proposed HF-UNet segmentation of oral ulcers achieves a DSC value of about 0.80 and the inference memory occupies only 2029 MB. The proposed method guarantees a low running load while maintaining a high-performance segmentation capability. The proposed Autooral dataset and code are available from  https://github.com/wurenkai/HF-UNet-and-Autooral-dataset .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在基于深度学习的医学图像分割任务中,半监督学习可以大大降低模型对标记数据的依赖。然而,现有的半监督医学图像分割方法面临着对象边界模糊和可用数据量小的挑战,这限制了分割模型在临床实践中的应用。为了解决这些问题,我们提出了一种基于双一致性指导的半监督医学图像分割网络,可以简单有效地从大的空间和维度范围内的无标记数据中提取可靠的语义信息。这有助于提高未标记数据对模型准确性的贡献。具体来说,我们构建了一个分裂的弱和强一致性约束策略,从未标记的数据中获取数据级和特征级一致性,以提高模型的学习效率。此外,设计了一个简单的多尺度底层细节特征增强模块来提高底层细节上下文信息的提取,这对于在半监督医学图像密集预测任务中精确定位目标轮廓和避免遗漏小目标至关重要。对六个具有挑战性的数据集的定量和定性评估表明,我们的模型在分割准确性方面优于其他半监督分割模型,并且在可泛化性方面具有优势。代码可在https://github.com/0Jmyy0/SSMIS-DC获得。
    In deep-learning-based medical image segmentation tasks, semi-supervised learning can greatly reduce the dependence of the model on labeled data. However, existing semi-supervised medical image segmentation methods face the challenges of object boundary ambiguity and a small amount of available data, which limit the application of segmentation models in clinical practice. To solve these problems, we propose a novel semi-supervised medical image segmentation network based on dual-consistency guidance, which can extract reliable semantic information from unlabeled data over a large spatial and dimensional range in a simple and effective manner. This serves to improve the contribution of unlabeled data to the model accuracy. Specifically, we construct a split weak and strong consistency constraint strategy to capture data-level and feature-level consistencies from unlabeled data to improve the learning efficiency of the model. Furthermore, we design a simple multi-scale low-level detail feature enhancement module to improve the extraction of low-level detail contextual information, which is crucial to accurately locate object contours and avoid omitting small objects in semi-supervised medical image dense prediction tasks. Quantitative and qualitative evaluations on six challenging datasets demonstrate that our model outperforms other semi-supervised segmentation models in terms of segmentation accuracy and presents advantages in terms of generalizability. Code is available at https://github.com/0Jmyy0/SSMIS-DC.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    分段任意模型(SAM),一般图像分割的基础模型,在许多自然图像分割任务中表现出令人印象深刻的零镜头性能。然而,SAM的性能在应用于医学图像时显著下降,主要是由于自然和医学图像领域之间的巨大差异。为了使SAM有效地适应医学图像,重要的是要纳入关键的三维信息,即,体积或时间知识,在微调期间。同时,我们的目标是充分利用SAM在其原始2D骨干中的预训练权重。在本文中,我们引入了一个与模态无关的SAM适应框架,命名为MA-SAM,适用于各种体积和视频医疗数据。我们的方法源于参数有效的微调策略,即仅更新一小部分权重增量,同时保留SAM的大部分预训练权重。通过将一系列3D适配器注入到图像编码器的变压器块中,我们的方法使预先训练的2D骨干能够从输入数据中提取三维信息。我们在五个医学图像分割任务上综合评估了我们的方法,通过跨CT使用11个公共数据集,MRI,和手术视频数据。值得注意的是,不使用任何提示,我们的方法始终优于各种最先进的3D方法,超过NNU-Net0.9%,2.6%,和9.9%的骰子用于CT多器官分割,MRI前列腺分割,和手术场景分割。我们的模型也展示了很强的泛化能力,并且在使用提示时擅长具有挑战性的肿瘤分割。我们的代码可在以下网址获得:https://github.com/cchen-cc/MA-SAM。
    The Segment Anything Model (SAM), a foundation model for general image segmentation, has demonstrated impressive zero-shot performance across numerous natural image segmentation tasks. However, SAM\'s performance significantly declines when applied to medical images, primarily due to the substantial disparity between natural and medical image domains. To effectively adapt SAM to medical images, it is important to incorporate critical third-dimensional information, i.e., volumetric or temporal knowledge, during fine-tuning. Simultaneously, we aim to harness SAM\'s pre-trained weights within its original 2D backbone to the fullest extent. In this paper, we introduce a modality-agnostic SAM adaptation framework, named as MA-SAM, that is applicable to various volumetric and video medical data. Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments while preserving the majority of SAM\'s pre-trained weights. By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data. We comprehensively evaluate our method on five medical image segmentation tasks, by using 11 public datasets across CT, MRI, and surgical video data. Remarkably, without using any prompt, our method consistently outperforms various state-of-the-art 3D approaches, surpassing nnU-Net by 0.9%, 2.6%, and 9.9% in Dice for CT multi-organ segmentation, MRI prostate segmentation, and surgical scene segmentation respectively. Our model also demonstrates strong generalization, and excels in challenging tumor segmentation when prompts are used. Our code is available at: https://github.com/cchen-cc/MA-SAM.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号