Self-supervised learning

自监督学习
  • 文章类型: Journal Article
    深度学习具有自动化筛查的潜力,医学图像中疾病的监测和分级。使用对比学习进行预训练,使模型能够从自然图像数据集中提取鲁棒和可推广的特征,促进标签高效的下游图像分析。然而,将常规对比方法直接应用于医学数据集引入了两个特定领域的问题。首先,几个图像变换已经被证明是至关重要的有效的对比学习不转换从自然图像到医学图像领域。其次,传统方法的假设,任何两个图像都是不同的,在描述相同解剖结构和疾病的医学数据集中具有系统性误导性。这在纵向图像数据集中加剧,所述纵向图像数据集重复地对相同的患者群组成像以监测他们随时间的疾病进展。在本文中,我们通过使用新颖的元数据增强策略扩展传统的对比框架来解决这些问题。我们的方法采用广泛可用的患者元数据来近似真实的图像间对比关系集。为此,我们使用患者身份记录,眼睛位置(即左或右)和时间序列信息。在使用两个大型纵向数据集的实验中,包含7912例年龄相关性黄斑变性(AMD)患者的170,427张视网膜光学相干断层扫描(OCT)图像,我们评估了使用元数据将疾病进展的时间动态纳入预训练的效用.在与AMD相关的六个图像级下游任务中,我们的元数据增强方法优于标准对比方法和视网膜图像基础模型。我们发现,从AMD阶段和类型分类到视敏度预测等任务的低数据和高数据制度都有好处。由于其模块化,我们的方法可以快速且经济有效地进行测试,以确定在对比预训练中包括可用元数据的潜在益处.
    Deep learning has potential to automate screening, monitoring and grading of disease in medical images. Pretraining with contrastive learning enables models to extract robust and generalisable features from natural image datasets, facilitating label-efficient downstream image analysis. However, the direct application of conventional contrastive methods to medical datasets introduces two domain-specific issues. Firstly, several image transformations which have been shown to be crucial for effective contrastive learning do not translate from the natural image to the medical image domain. Secondly, the assumption made by conventional methods, that any two images are dissimilar, is systematically misleading in medical datasets depicting the same anatomy and disease. This is exacerbated in longitudinal image datasets that repeatedly image the same patient cohort to monitor their disease progression over time. In this paper we tackle these issues by extending conventional contrastive frameworks with a novel metadata-enhanced strategy. Our approach employs widely available patient metadata to approximate the true set of inter-image contrastive relationships. To this end we employ records for patient identity, eye position (i.e. left or right) and time series information. In experiments using two large longitudinal datasets containing 170,427 retinal optical coherence tomography (OCT) images of 7912 patients with age-related macular degeneration (AMD), we evaluate the utility of using metadata to incorporate the temporal dynamics of disease progression into pretraining. Our metadata-enhanced approach outperforms both standard contrastive methods and a retinal image foundation model in five out of six image-level downstream tasks related to AMD. We find benefits in both a low-data and high-data regime across tasks ranging from AMD stage and type classification to prediction of visual acuity. Due to its modularity, our method can be quickly and cost-effectively tested to establish the potential benefits of including available metadata in contrastive pretraining.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    大脑健康研究的新兴领域越来越多地利用人工智能(AI)来分析和解释神经成像数据。医学基础模型已显示出具有更好的样品效率的卓越性能的希望。这项工作介绍了一种新颖的方法,用于通过自我监督训练创建用于多模式神经图像分割的3维(3D)医学基础模型。我们的方法涉及一种使用视觉变换器的新颖的两阶段预训练方法。第一阶段从41,400名参与者的多模态脑磁共振成像(MRI)图像的大规模未标记神经图像数据集中编码一般健康大脑的解剖结构。这一阶段的重点是识别关键特征,如不同大脑结构的形状和大小。第二个预训练阶段识别疾病特定属性,例如肿瘤和病变的几何形状以及脑内的空间位置。这种双相方法显着降低了神经图像分割中AI模型训练通常所需的广泛数据要求,并具有适应各种成像模式的灵活性。我们严格评估我们的模型,BrainSeger创始人,使用脑肿瘤分割(BraTS)挑战和中风后病变的解剖追踪v2.0(ATLASv2.0)数据集。BrainSegfounder展示了显著的业绩增长,超越了以前使用完全监督学习的获奖解决方案的成就。我们的发现强调了扩大模型复杂性和从通常健康的大脑获得的未标记训练数据量的影响。这两个因素都增强了神经图像分割任务中模型的准确性和预测能力。我们的预训练模型和代码位于https://github.com/lab-smile/BrainSegFounder。
    The burgeoning field of brain health research increasingly leverages artificial intelligence (AI) to analyze and interpret neuroimaging data. Medical foundation models have shown promise of superior performance with better sample efficiency. This work introduces a novel approach towards creating 3-dimensional (3D) medical foundation models for multimodal neuroimage segmentation through self-supervised training. Our approach involves a novel two-stage pretraining approach using vision transformers. The first stage encodes anatomical structures in generally healthy brains from the large-scale unlabeled neuroimage dataset of multimodal brain magnetic resonance imaging (MRI) images from 41,400 participants. This stage of pertaining focuses on identifying key features such as shapes and sizes of different brain structures. The second pretraining stage identifies disease-specific attributes, such as geometric shapes of tumors and lesions and spatial placements within the brain. This dual-phase methodology significantly reduces the extensive data requirements usually necessary for AI model training in neuroimage segmentation with the flexibility to adapt to various imaging modalities. We rigorously evaluate our model, BrainSegFounder, using the Brain Tumor Segmentation (BraTS) challenge and Anatomical Tracings of Lesions After Stroke v2.0 (ATLAS v2.0) datasets. BrainSegFounder demonstrates a significant performance gain, surpassing the achievements of the previous winning solutions using fully supervised learning. Our findings underscore the impact of scaling up both the model complexity and the volume of unlabeled training data derived from generally healthy brains. Both of these factors enhance the accuracy and predictive capabilities of the model in neuroimage segmentation tasks. Our pretrained models and code are at https://github.com/lab-smile/BrainSegFounder.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    通过Twitter等社交平台快速传播未经验证的信息对社会稳定构成相当大的危险。识别真实索赔和虚假索赔具有挑战性,和以前的谣言检测方法的工作往往不能有效地捕获传播结构特征。这些方法还经常忽略与来源帖子的讨论主题无关的评论的存在。为了解决这个问题,我们介绍了一种新颖的方法:结构感知多级图注意网络(SAMGAT)用于谣言分类。SAMGAT采用动态注意力机制,将GATv2和点积注意力融合在一起,以捕捉帖子之间的上下文关系,允许根据中央节点的立场来改变注意力分数。该模型包含了一种结构感知的注意力机制,该机制可以学习可以指示边缘存在的注意力权重,有效反映谣言的传播结构。此外,SAMGAT采用了前k注意过滤机制来选择最相关的相邻节点,增强其关注谣言传播关键结构特征的能力。此外,SAMGAT包括一个索赔指导的注意力汇集机制,该机制具有阈值步骤,可以在构建事件表示时专注于信息量最大的帖子。在基准数据集上的实验结果表明,SAMGAT在识别谣言方面优于最先进的方法,并提高了早期谣言检测的有效性。
    The rapid dissemination of unverified information through social platforms like Twitter poses considerable dangers to societal stability. Identifying real versus fake claims is challenging, and previous work on rumor detection methods often fails to effectively capture propagation structure features. These methods also often overlook the presence of comments irrelevant to the discussion topic of the source post. To address this, we introduce a novel approach: the Structure-Aware Multilevel Graph Attention Network (SAMGAT) for rumor classification. SAMGAT employs a dynamic attention mechanism that blends GATv2 and dot-product attention to capture the contextual relationships between posts, allowing for varying attention scores based on the stance of the central node. The model incorporates a structure-aware attention mechanism that learns attention weights that can indicate the existence of edges, effectively reflecting the propagation structure of rumors. Moreover, SAMGAT incorporates a top-k attention filtering mechanism to select the most relevant neighboring nodes, enhancing its ability to focus on the key structural features of rumor propagation. Furthermore, SAMGAT includes a claim-guided attention pooling mechanism with a thresholding step to focus on the most informative posts when constructing the event representation. Experimental results on benchmark datasets demonstrate that SAMGAT outperforms state-of-the-art methods in identifying rumors and improves the effectiveness of early rumor detection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    创伤性脑损伤(TBI)由于其固有的异质性而呈现广泛的临床表现和结果,导致不同的恢复轨迹和不同的治疗反应。虽然许多研究已经深入研究了不同患者人群的TBI表型,识别在各种环境和人群中持续推广的TBI表型仍然是一个关键的研究空白。我们的研究通过采用多变量时间序列聚类来揭示TBI的动态复杂性来解决这一问题。利用基于自监督学习的方法对具有缺失值的多元时间序列数据进行聚类(SLAC-Time),我们分析了以研究为中心的TRACK-TBI和真实世界的MIMIC-IV数据集。值得注意的是,SLAC-Time的最佳超参数和理想的聚类数量在这些数据集中保持一致,强调跨异构数据集的SLAC-Time\的稳定性。我们的分析揭示了三种可推广的TBI表型(α,β,和γ),在急诊就诊期间,每个都表现出明显的非时间特征,和整个ICU停留的时间特征曲线。具体来说,表型α代表轻度TBI,临床表现非常一致。相比之下,表型β表示具有不同临床表现的严重TBI,和表型γ代表在严重程度和临床多样性方面的中等TBI谱。年龄是TBI结果的重要决定因素,年龄较大的队列记录较高的死亡率。重要的是,虽然某些特征因年龄而异,与每种表型相关的TBI表现的核心特征在不同人群中保持一致.
    Traumatic Brain Injury (TBI) presents a broad spectrum of clinical presentations and outcomes due to its inherent heterogeneity, leading to diverse recovery trajectories and varied therapeutic responses. While many studies have delved into TBI phenotyping for distinct patient populations, identifying TBI phenotypes that consistently generalize across various settings and populations remains a critical research gap. Our research addresses this by employing multivariate time-series clustering to unveil TBI\'s dynamic intricates. Utilizing a self-supervised learning-based approach to clustering multivariate time-Series data with missing values (SLAC-Time), we analyzed both the research-centric TRACK-TBI and the real-world MIMIC-IV datasets. Remarkably, the optimal hyperparameters of SLAC-Time and the ideal number of clusters remained consistent across these datasets, underscoring SLAC-Time\'s stability across heterogeneous datasets. Our analysis revealed three generalizable TBI phenotypes (α, β, and γ), each exhibiting distinct non-temporal features during emergency department visits, and temporal feature profiles throughout ICU stays. Specifically, phenotype α represents mild TBI with a remarkably consistent clinical presentation. In contrast, phenotype β signifies severe TBI with diverse clinical manifestations, and phenotype γ represents a moderate TBI profile in terms of severity and clinical diversity. Age is a significant determinant of TBI outcomes, with older cohorts recording higher mortality rates. Importantly, while certain features varied by age, the core characteristics of TBI manifestations tied to each phenotype remain consistent across diverse populations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    小儿呼吸系统疾病诊断和后续治疗需要准确和可解释的分析。胸部X光检查是识别和监测儿童各种胸部疾病的最具成本效益和快速的方法。自我监督和迁移学习的最新发展显示了它们在医学成像中的潜力,包括胸部X光片区域。在这篇文章中,我们提出了一个三阶段框架,从成人胸部X光片转移知识,以帮助诊断和解释小儿胸部疾病.我们使用不同的预训练和微调策略进行了全面的实验,以开发变压器或卷积神经网络模型,然后对其进行定性和定量评估。ViT-Base/16模型,使用CheXpert数据集进行微调,一个大的胸部X光数据集,成为最有效的,在六个疾病类别中,平均AUC为0.761(95%CI:0.759-0.763),并显示出高敏感性(平均0.639)和特异性(平均0.683),这表明了它强大的辨别能力。基线模型,ViT-Small/16和ViT-Base/16,当直接在儿科CXR数据集上训练时,仅达到平均AUC评分0.646(95%CI:0.641-0.651)和0.654(95%CI:0.648-0.660),分别。定性,我们的模型擅长定位患病区域,优于在ImageNet和其他微调方法上预先训练的模型,从而提供优越的解释。源代码可以在线获得,数据可以从PhysioNet获得。
    Pediatric respiratory disease diagnosis and subsequent treatment require accurate and interpretable analysis. A chest X-ray is the most cost-effective and rapid method for identifying and monitoring various thoracic diseases in children. Recent developments in self-supervised and transfer learning have shown their potential in medical imaging, including chest X-ray areas. In this article, we propose a three-stage framework with knowledge transfer from adult chest X-rays to aid the diagnosis and interpretation of pediatric thorax diseases. We conducted comprehensive experiments with different pre-training and fine-tuning strategies to develop transformer or convolutional neural network models and then evaluate them qualitatively and quantitatively. The ViT-Base/16 model, fine-tuned with the CheXpert dataset, a large chest X-ray dataset, emerged as the most effective, achieving a mean AUC of 0.761 (95% CI: 0.759-0.763) across six disease categories and demonstrating a high sensitivity (average 0.639) and specificity (average 0.683), which are indicative of its strong discriminative ability. The baseline models, ViT-Small/16 and ViT-Base/16, when directly trained on the Pediatric CXR dataset, only achieved mean AUC scores of 0.646 (95% CI: 0.641-0.651) and 0.654 (95% CI: 0.648-0.660), respectively. Qualitatively, our model excels in localizing diseased regions, outperforming models pre-trained on ImageNet and other fine-tuning approaches, thus providing superior explanations. The source code is available online and the data can be obtained from PhysioNet.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    如今,自动驾驶技术已经广泛流行。智能车辆已经配备了各种传感器(例如,视觉传感器,激光雷达,深度相机等.).其中,具有定制语义分割和感知算法的视觉系统在场景理解中起着至关重要的作用。然而,传统的有监督语义分割需要大量的像素级人工标注来完成模型训练。尽管少拍方法在一定程度上减少了注释工作,他们仍然是劳动密集型的。在本文中,提出了一种基于多任务学习和密集注意力计算的自监督少镜头语义分割方法(MLDAC)。图像的显著部分被分成两部分;其中一部分用作少拍分割的支持掩模,同时将预测结果与其他部分和整个区域之间的交叉熵损失分别作为多任务学习计算,以提高模型的泛化能力。SwinTransformer用作我们的骨干,以不同的比例提取特征图。然后将这些特征图输入到多个层级的密集注意力计算块以增强像素级对应。通过尺度间混合和特征跳跃连接得到最终的预测结果。实验结果表明,MLDAC在PASCAL-5i和COCO-20i数据集上获得了55.1%和26.8%的单发自监督少发分割,分别。此外,它在FSS-1000少数镜头数据集上达到78.1%,证明其功效。
    Nowadays, autonomous driving technology has become widely prevalent. The intelligent vehicles have been equipped with various sensors (e.g., vision sensors, LiDAR, depth cameras etc.). Among them, the vision systems with tailored semantic segmentation and perception algorithms play critical roles in scene understanding. However, the traditional supervised semantic segmentation needs a large number of pixel-level manual annotations to complete model training. Although few-shot methods reduce the annotation work to some extent, they are still labor intensive. In this paper, a self-supervised few-shot semantic segmentation method based on Multi-task Learning and Dense Attention Computation (dubbed MLDAC) is proposed. The salient part of an image is split into two parts; one of them serves as the support mask for few-shot segmentation, while cross-entropy losses are calculated between the other part and the entire region with the predicted results separately as multi-task learning so as to improve the model\'s generalization ability. Swin Transformer is used as our backbone to extract feature maps at different scales. These feature maps are then input to multiple levels of dense attention computation blocks to enhance pixel-level correspondence. The final prediction results are obtained through inter-scale mixing and feature skip connection. The experimental results indicate that MLDAC obtains 55.1% and 26.8% one-shot mIoU self-supervised few-shot segmentation on the PASCAL-5i and COCO-20i datasets, respectively. In addition, it achieves 78.1% on the FSS-1000 few-shot dataset, proving its efficacy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    复杂环境下的鲁棒目标检测,视觉条件差,和开放场景在自动驾驶中提出了重大的技术挑战。这些挑战需要开发毫米波(mmWave)雷达点云数据和视觉图像的先进融合方法。为了解决这些问题,本文提出了一种雷达相机鲁棒融合网络(RCRFNet),它利用自监督学习和开放集识别来有效地利用来自两个传感器的互补信息。具体来说,该网络通过视锥关联方法使用匹配的雷达摄像机数据来生成自我监督信号,加强网络培训。雷达点云和视觉图像之间的全局和局部深度一致性的集成,连同图像特征,帮助构造用于检测未知目标的对象类置信度。此外,这些技术与多层特征提取主干和多模态特征检测头相结合,以实现鲁棒的对象检测。在nuScenes公共数据集上的实验表明,RCRFNet优于最先进的(SOTA)方法,特别是在低视觉能见度的条件下和检测未知类对象时。
    Robust object detection in complex environments, poor visual conditions, and open scenarios presents significant technical challenges in autonomous driving. These challenges necessitate the development of advanced fusion methods for millimeter-wave (mmWave) radar point cloud data and visual images. To address these issues, this paper proposes a radar-camera robust fusion network (RCRFNet), which leverages self-supervised learning and open-set recognition to effectively utilise the complementary information from both sensors. Specifically, the network uses matched radar-camera data through a frustum association approach to generate self-supervised signals, enhancing network training. The integration of global and local depth consistencies between radar point clouds and visual images, along with image features, helps construct object class confidence levels for detecting unknown targets. Additionally, these techniques are combined with a multi-layer feature extraction backbone and a multimodal feature detection head to achieve robust object detection. Experiments on the nuScenes public dataset demonstrate that RCRFNet outperforms state-of-the-art (SOTA) methods, particularly in conditions of low visual visibility and when detecting unknown class objects.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    磁共振成像(MRI)中深度学习的实现显着缩短了数据采集时间。然而,在获取完全采样的数据集不可行或成本高昂的情况下,这些技术面临着巨大的限制。为了解决这个问题,我们提出了一种用于并行MRI重建的融合增强对比自监督学习(FCSSL)方法,消除了对完全采样的k空间训练数据集和线圈灵敏度图的需要。首先,我们在对比学习框架内引入了基于两对重欠采样掩模的策略,旨在提高代表能力,实现更高质量的重建。随后,一种新颖的自适应融合网络,以自我监督的学习方式训练,旨在整合框架的重构成果。在不同采样掩模下对膝盖数据集的实验结果表明,与其他自监督学习方法相比,所提出的FCSSL具有出色的重建性能。此外,FCSSL的性能接近监督方法的性能,尤其是在2DRU和RADU面具下。拟议的FCSSL,在3×1DRU和2DRU面具下训练,可以有效地推广到看不见的1D和2D欠采样掩模,分别。对于与源域数据存在显著差异的目标域数据,提出的模型,用目标域中的几十个欠采样数据实例进行微调,实现的重建性能与使用整个欠采样数据集训练的模型所实现的重建性能相当。新颖的FCSSL模型为重建高质量的MR图像提供了可行的解决方案,而无需完全采样的数据集。从而克服了在难以采集完全采样的MR数据的情况下的主要障碍。
    The implementation of deep learning in Magnetic Resonance Imaging (MRI) has significantly advanced the reduction of data acquisition times. However, these techniques face substantial limitations in scenarios where acquiring fully sampled datasets is unfeasible or costly. To tackle this problem, we propose a Fusion enhanced Contrastive Self-Supervised Learning (FCSSL) method for parallel MRI reconstruction, eliminating the need for fully sampled k-space training dataset and coil sensitivity maps. First, we introduce a strategy based on two pairs of re-undersampling masks within a contrastive learning framework, aimed at enhancing the representational capacity to achieve higher quality reconstruction. Subsequently, a novel adaptive fusion network, trained in a self-supervised learning manner, is designed to integrate the reconstruction results of the framework. Experimental results on knee datasets under different sampling masks demonstrate that the proposed FCSSL achieves superior reconstruction performance compared to other self-supervised learning methods. Moreover, the performance of FCSSL approaches that of the supervised methods, especially under the 2DRU and RADU masks. The proposed FCSSL, trained under the 3× 1DRU and 2DRU masks, can effectively generalize to unseen 1D and 2D undersampling masks, respectively. For target domain data that exhibit significant differences from source domain data, the proposed model, fine-tuned with just a few dozen instances of undersampled data in the target domain, achieves reconstruction performance comparable to that achieved by the model trained with the entire set of undersampled data. The novel FCSSL model offers a viable solution for reconstructing high-quality MR images without needing fully sampled datasets, thereby overcoming a major hurdle in scenarios where acquiring fully sampled MR data is difficult.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    顺序推荐通常利用深度神经网络来挖掘交互序列中的丰富信息。然而,现有方法往往面临交互数据不足的问题。为了缓解稀疏性问题,将自监督学习引入顺序推荐中。尽管有效,我们认为,当前基于自监督学习的(即,基于SSL的)序贯推荐模型具有以下局限性:(1)仅使用单一的自监督学习方法,对比自监督学习或生成自监督学习。(2)在图结构域或节点特征域中采用简单的数据增强策略。我们认为,他们没有充分利用这两种自监督方法的功能,也没有充分探索结合图增强方案的优势。因此,他们往往无法学习更好的项目表示。鉴于此,我们提出了一种新的多任务顺序推荐框架,称为自适应自监督学习顺序推荐(ASLRec)。具体来说,我们的框架自适应地结合了对比和生成自监督学习方法,同时在图拓扑和节点特征级别应用不同的扰动。这种方法构造了不同的增强图视图,并采用了多个损失函数(包括对比损失,生成损失,遮罩损失,和预测损失)用于联合训练。通过包含各种方法的功能,我们的模型在不同的增强图视图中学习项目表示,以实现更好的性能并有效地减轻交互噪声和稀疏性。此外,我们将一小部分随机均匀噪声添加到项目表示中,使项目表示更加统一,并减轻交互记录中固有的受欢迎程度偏差。我们对三个公开的基准数据集进行了广泛的实验,以评估我们的模型。结果表明,与其他14种竞争方法相比,我们的方法实现了最先进的性能:命中率(HR)提高了14.39%以上,归一化贴现累计收益(NDCG)增长超过18.67%。
    Sequential recommendation typically utilizes deep neural networks to mine rich information in interaction sequences. However, existing methods often face the issue of insufficient interaction data. To alleviate the sparsity issue, self-supervised learning is introduced into sequential recommendation. Despite its effectiveness, we argue that current self-supervised learning-based (i.e., SSL-based) sequential recommendation models have the following limitations: (1) using only a single self-supervised learning method, either contrastive self-supervised learning or generative self-supervised learning. (2) employing a simple data augmentation strategy in either the graph structure domain or the node feature domain. We believe that they have not fully utilized the capabilities of both self-supervised methods and have not sufficiently explored the advantages of combining graph augmentation schemes. As a result, they often fail to learn better item representations. In light of this, we propose a novel multi-task sequential recommendation framework named Adaptive Self-supervised Learning for sequential Recommendation (ASLRec). Specifically, our framework combines contrastive and generative self-supervised learning methods adaptively, simultaneously applying different perturbations at both the graph topology and node feature levels. This approach constructs diverse augmented graph views and employs multiple loss functions (including contrastive loss, generative loss, mask loss, and prediction loss) for joint training. By encompassing the capabilities of various methods, our model learns item representations across different augmented graph views to achieve better performance and effectively mitigate interaction noise and sparsity. In addition, we add a small proportion of random uniform noise to item representations, making the item representations more uniform and mitigating the inherent popularity bias in interaction records. We conduct extensive experiments on three publicly available benchmark datasets to evaluate our model. The results demonstrate that our approach achieves state-of-the-art performance compared to 14 other competitive methods: the hit rate (HR) improved by over 14.39%, and the normalized discounted cumulative gain (NDCG) increased by over 18.67%.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    患者特定器官形状与内窥镜摄像机图像的形状配准有望成为实现图像引导手术的关键。并且已经考虑了机器学习方法的各种应用。由于可从临床病例中获得的训练数据数量有限,已尝试使用从统计变形模型生成的合成图像;然而,合成图像和真实场景之间的差异对估计的影响是一个问题。在这项研究中,我们提出了一种自监督离线学习框架,用于使用通常从合成图像和真实相机图像获得的图像特征进行基于模型的配准。由于可用于训练的内窥镜图像数量有限,我们使用从非线性变形模型生成的合成图像,该图像表示可能的术中气胸变形.为了解决从合成和真实图像获得的常见图像特征中估计变形形状和视点的困难,我们试图通过在合成图像中添加可以作为先验知识获得的阴影和距离信息来改善配准误差。通过学习预测两个合成图像之间的差分模型参数的任务来执行与真实相机图像的形状配准。在胸腔镜肺癌切除术中,开发的框架实现了平均绝对误差小于10毫米,平均距离小于5毫米的配准精度。确认与传统方法相比提高了预测精度。
    Shape registration of patient-specific organ shapes to endoscopic camera images is expected to be a key to realizing image-guided surgery, and a variety of applications of machine learning methods have been considered. Because the number of training data available from clinical cases is limited, the use of synthetic images generated from a statistical deformation model has been attempted; however, the influence on estimation caused by the difference between synthetic images and real scenes is a problem. In this study, we propose a self-supervised offline learning framework for model-based registration using image features commonly obtained from synthetic images and real camera images. Because of the limited number of endoscopic images available for training, we use a synthetic image generated from the nonlinear deformation model that represents possible intraoperative pneumothorax deformations. In order to solve the difficulty in estimating deformed shapes and viewpoints from the common image features obtained from synthetic and real images, we attempted to improve the registration error by adding the shading and distance information that can be obtained as prior knowledge in the synthetic image. Shape registration with real camera images is performed by learning the task of predicting the differential model parameters between two synthetic images. The developed framework achieved registration accuracy with a mean absolute error of less than 10 mm and a mean distance of less than 5 mm in a thoracoscopic pulmonary cancer resection, confirming improved prediction accuracy compared with conventional methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号