关键词: GPT anomaly detection domain generalization multimodal contrastive learning

Mesh : Humans Algorithms Image Processing, Computer-Assisted / methods

来  源:   DOI:10.3390/biom14050590   PDF(Pubmed)

Abstract:
Medical data have unique specificity and professionalism, requiring substantial domain expertise for their annotation. Precise data annotation is essential for anomaly-detection tasks, making the training process complex. Domain generalization (DG) is an important approach to enhancing medical image anomaly detection (AD). This paper introduces a novel multimodal anomaly-detection framework called MedicalCLIP. MedicalCLIP utilizes multimodal data in anomaly-detection tasks and establishes irregular constraints within modalities for images and text. The key to MedicalCLIP lies in learning intramodal detailed representations, which are combined with text semantic-guided cross-modal contrastive learning, allowing the model to focus on semantic information while capturing more detailed information, thus achieving more fine-grained anomaly detection. MedicalCLIP relies on GPT prompts to generate text, reducing the demand for professional descriptions of medical data. Text construction for medical data helps to improve the generalization ability of multimodal models for anomaly-detection tasks. Additionally, during the text-image contrast-enhancement process, the model\'s ability to select and extract information from image data is improved. Through hierarchical contrastive loss, fine-grained representations are achieved in the image-representation process. MedicalCLIP has been validated on various medical datasets, showing commendable domain generalization performance in medical-data anomaly detection. Improvements were observed in both anomaly classification and segmentation metrics. In the anomaly classification (AC) task involving brain data, the method demonstrated a 2.81 enhancement in performance over the best existing approach.
摘要:
医疗数据具有独特的特殊性和专业性,需要大量的领域专业知识来进行注释。精确的数据注释对于异常检测任务至关重要,使培训过程变得复杂。域泛化(DG)是增强医学图像异常检测(AD)的重要方法。本文介绍了一种新的多模态异常检测框架,称为MedicalCLIP。MedicalCLIP在异常检测任务中利用多模态数据,并在图像和文本的模态中建立不规则的约束。MedicalCLIP的关键在于学习模态内的详细表示,与文本语义引导的跨模态对比学习相结合,允许模型专注于语义信息,同时捕获更详细的信息,从而实现更细粒度的异常检测。MedicalCLIP依靠GPT提示来生成文本,减少对医疗数据专业描述的需求。医学数据的文本构造有助于提高多模态模型对异常检测任务的泛化能力。此外,在文本图像对比度增强过程中,模型从图像数据中选择和提取信息的能力得到提高。通过分层对比损失,在图像表示过程中实现了细粒度的表示。MedicalCLIP已在各种医疗数据集上得到验证,在医疗数据异常检测中显示出值得称赞的领域泛化性能。在异常分类和分割度量方面都观察到了改进。在涉及大脑数据的异常分类(AC)任务中,该方法在性能上比现有的最佳方法提高了2.81。
公众号