Self-supervised learning

自监督学习
  • 文章类型: Journal Article
    通过Twitter等社交平台快速传播未经验证的信息对社会稳定构成相当大的危险。识别真实索赔和虚假索赔具有挑战性,和以前的谣言检测方法的工作往往不能有效地捕获传播结构特征。这些方法还经常忽略与来源帖子的讨论主题无关的评论的存在。为了解决这个问题,我们介绍了一种新颖的方法:结构感知多级图注意网络(SAMGAT)用于谣言分类。SAMGAT采用动态注意力机制,将GATv2和点积注意力融合在一起,以捕捉帖子之间的上下文关系,允许根据中央节点的立场来改变注意力分数。该模型包含了一种结构感知的注意力机制,该机制可以学习可以指示边缘存在的注意力权重,有效反映谣言的传播结构。此外,SAMGAT采用了前k注意过滤机制来选择最相关的相邻节点,增强其关注谣言传播关键结构特征的能力。此外,SAMGAT包括一个索赔指导的注意力汇集机制,该机制具有阈值步骤,可以在构建事件表示时专注于信息量最大的帖子。在基准数据集上的实验结果表明,SAMGAT在识别谣言方面优于最先进的方法,并提高了早期谣言检测的有效性。
    The rapid dissemination of unverified information through social platforms like Twitter poses considerable dangers to societal stability. Identifying real versus fake claims is challenging, and previous work on rumor detection methods often fails to effectively capture propagation structure features. These methods also often overlook the presence of comments irrelevant to the discussion topic of the source post. To address this, we introduce a novel approach: the Structure-Aware Multilevel Graph Attention Network (SAMGAT) for rumor classification. SAMGAT employs a dynamic attention mechanism that blends GATv2 and dot-product attention to capture the contextual relationships between posts, allowing for varying attention scores based on the stance of the central node. The model incorporates a structure-aware attention mechanism that learns attention weights that can indicate the existence of edges, effectively reflecting the propagation structure of rumors. Moreover, SAMGAT incorporates a top-k attention filtering mechanism to select the most relevant neighboring nodes, enhancing its ability to focus on the key structural features of rumor propagation. Furthermore, SAMGAT includes a claim-guided attention pooling mechanism with a thresholding step to focus on the most informative posts when constructing the event representation. Experimental results on benchmark datasets demonstrate that SAMGAT outperforms state-of-the-art methods in identifying rumors and improves the effectiveness of early rumor detection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    如今,自动驾驶技术已经广泛流行。智能车辆已经配备了各种传感器(例如,视觉传感器,激光雷达,深度相机等.).其中,具有定制语义分割和感知算法的视觉系统在场景理解中起着至关重要的作用。然而,传统的有监督语义分割需要大量的像素级人工标注来完成模型训练。尽管少拍方法在一定程度上减少了注释工作,他们仍然是劳动密集型的。在本文中,提出了一种基于多任务学习和密集注意力计算的自监督少镜头语义分割方法(MLDAC)。图像的显著部分被分成两部分;其中一部分用作少拍分割的支持掩模,同时将预测结果与其他部分和整个区域之间的交叉熵损失分别作为多任务学习计算,以提高模型的泛化能力。SwinTransformer用作我们的骨干,以不同的比例提取特征图。然后将这些特征图输入到多个层级的密集注意力计算块以增强像素级对应。通过尺度间混合和特征跳跃连接得到最终的预测结果。实验结果表明,MLDAC在PASCAL-5i和COCO-20i数据集上获得了55.1%和26.8%的单发自监督少发分割,分别。此外,它在FSS-1000少数镜头数据集上达到78.1%,证明其功效。
    Nowadays, autonomous driving technology has become widely prevalent. The intelligent vehicles have been equipped with various sensors (e.g., vision sensors, LiDAR, depth cameras etc.). Among them, the vision systems with tailored semantic segmentation and perception algorithms play critical roles in scene understanding. However, the traditional supervised semantic segmentation needs a large number of pixel-level manual annotations to complete model training. Although few-shot methods reduce the annotation work to some extent, they are still labor intensive. In this paper, a self-supervised few-shot semantic segmentation method based on Multi-task Learning and Dense Attention Computation (dubbed MLDAC) is proposed. The salient part of an image is split into two parts; one of them serves as the support mask for few-shot segmentation, while cross-entropy losses are calculated between the other part and the entire region with the predicted results separately as multi-task learning so as to improve the model\'s generalization ability. Swin Transformer is used as our backbone to extract feature maps at different scales. These feature maps are then input to multiple levels of dense attention computation blocks to enhance pixel-level correspondence. The final prediction results are obtained through inter-scale mixing and feature skip connection. The experimental results indicate that MLDAC obtains 55.1% and 26.8% one-shot mIoU self-supervised few-shot segmentation on the PASCAL-5i and COCO-20i datasets, respectively. In addition, it achieves 78.1% on the FSS-1000 few-shot dataset, proving its efficacy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    复杂环境下的鲁棒目标检测,视觉条件差,和开放场景在自动驾驶中提出了重大的技术挑战。这些挑战需要开发毫米波(mmWave)雷达点云数据和视觉图像的先进融合方法。为了解决这些问题,本文提出了一种雷达相机鲁棒融合网络(RCRFNet),它利用自监督学习和开放集识别来有效地利用来自两个传感器的互补信息。具体来说,该网络通过视锥关联方法使用匹配的雷达摄像机数据来生成自我监督信号,加强网络培训。雷达点云和视觉图像之间的全局和局部深度一致性的集成,连同图像特征,帮助构造用于检测未知目标的对象类置信度。此外,这些技术与多层特征提取主干和多模态特征检测头相结合,以实现鲁棒的对象检测。在nuScenes公共数据集上的实验表明,RCRFNet优于最先进的(SOTA)方法,特别是在低视觉能见度的条件下和检测未知类对象时。
    Robust object detection in complex environments, poor visual conditions, and open scenarios presents significant technical challenges in autonomous driving. These challenges necessitate the development of advanced fusion methods for millimeter-wave (mmWave) radar point cloud data and visual images. To address these issues, this paper proposes a radar-camera robust fusion network (RCRFNet), which leverages self-supervised learning and open-set recognition to effectively utilise the complementary information from both sensors. Specifically, the network uses matched radar-camera data through a frustum association approach to generate self-supervised signals, enhancing network training. The integration of global and local depth consistencies between radar point clouds and visual images, along with image features, helps construct object class confidence levels for detecting unknown targets. Additionally, these techniques are combined with a multi-layer feature extraction backbone and a multimodal feature detection head to achieve robust object detection. Experiments on the nuScenes public dataset demonstrate that RCRFNet outperforms state-of-the-art (SOTA) methods, particularly in conditions of low visual visibility and when detecting unknown class objects.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    磁共振成像(MRI)中深度学习的实现显着缩短了数据采集时间。然而,在获取完全采样的数据集不可行或成本高昂的情况下,这些技术面临着巨大的限制。为了解决这个问题,我们提出了一种用于并行MRI重建的融合增强对比自监督学习(FCSSL)方法,消除了对完全采样的k空间训练数据集和线圈灵敏度图的需要。首先,我们在对比学习框架内引入了基于两对重欠采样掩模的策略,旨在提高代表能力,实现更高质量的重建。随后,一种新颖的自适应融合网络,以自我监督的学习方式训练,旨在整合框架的重构成果。在不同采样掩模下对膝盖数据集的实验结果表明,与其他自监督学习方法相比,所提出的FCSSL具有出色的重建性能。此外,FCSSL的性能接近监督方法的性能,尤其是在2DRU和RADU面具下。拟议的FCSSL,在3×1DRU和2DRU面具下训练,可以有效地推广到看不见的1D和2D欠采样掩模,分别。对于与源域数据存在显著差异的目标域数据,提出的模型,用目标域中的几十个欠采样数据实例进行微调,实现的重建性能与使用整个欠采样数据集训练的模型所实现的重建性能相当。新颖的FCSSL模型为重建高质量的MR图像提供了可行的解决方案,而无需完全采样的数据集。从而克服了在难以采集完全采样的MR数据的情况下的主要障碍。
    The implementation of deep learning in Magnetic Resonance Imaging (MRI) has significantly advanced the reduction of data acquisition times. However, these techniques face substantial limitations in scenarios where acquiring fully sampled datasets is unfeasible or costly. To tackle this problem, we propose a Fusion enhanced Contrastive Self-Supervised Learning (FCSSL) method for parallel MRI reconstruction, eliminating the need for fully sampled k-space training dataset and coil sensitivity maps. First, we introduce a strategy based on two pairs of re-undersampling masks within a contrastive learning framework, aimed at enhancing the representational capacity to achieve higher quality reconstruction. Subsequently, a novel adaptive fusion network, trained in a self-supervised learning manner, is designed to integrate the reconstruction results of the framework. Experimental results on knee datasets under different sampling masks demonstrate that the proposed FCSSL achieves superior reconstruction performance compared to other self-supervised learning methods. Moreover, the performance of FCSSL approaches that of the supervised methods, especially under the 2DRU and RADU masks. The proposed FCSSL, trained under the 3× 1DRU and 2DRU masks, can effectively generalize to unseen 1D and 2D undersampling masks, respectively. For target domain data that exhibit significant differences from source domain data, the proposed model, fine-tuned with just a few dozen instances of undersampled data in the target domain, achieves reconstruction performance comparable to that achieved by the model trained with the entire set of undersampled data. The novel FCSSL model offers a viable solution for reconstructing high-quality MR images without needing fully sampled datasets, thereby overcoming a major hurdle in scenarios where acquiring fully sampled MR data is difficult.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    顺序推荐通常利用深度神经网络来挖掘交互序列中的丰富信息。然而,现有方法往往面临交互数据不足的问题。为了缓解稀疏性问题,将自监督学习引入顺序推荐中。尽管有效,我们认为,当前基于自监督学习的(即,基于SSL的)序贯推荐模型具有以下局限性:(1)仅使用单一的自监督学习方法,对比自监督学习或生成自监督学习。(2)在图结构域或节点特征域中采用简单的数据增强策略。我们认为,他们没有充分利用这两种自监督方法的功能,也没有充分探索结合图增强方案的优势。因此,他们往往无法学习更好的项目表示。鉴于此,我们提出了一种新的多任务顺序推荐框架,称为自适应自监督学习顺序推荐(ASLRec)。具体来说,我们的框架自适应地结合了对比和生成自监督学习方法,同时在图拓扑和节点特征级别应用不同的扰动。这种方法构造了不同的增强图视图,并采用了多个损失函数(包括对比损失,生成损失,遮罩损失,和预测损失)用于联合训练。通过包含各种方法的功能,我们的模型在不同的增强图视图中学习项目表示,以实现更好的性能并有效地减轻交互噪声和稀疏性。此外,我们将一小部分随机均匀噪声添加到项目表示中,使项目表示更加统一,并减轻交互记录中固有的受欢迎程度偏差。我们对三个公开的基准数据集进行了广泛的实验,以评估我们的模型。结果表明,与其他14种竞争方法相比,我们的方法实现了最先进的性能:命中率(HR)提高了14.39%以上,归一化贴现累计收益(NDCG)增长超过18.67%。
    Sequential recommendation typically utilizes deep neural networks to mine rich information in interaction sequences. However, existing methods often face the issue of insufficient interaction data. To alleviate the sparsity issue, self-supervised learning is introduced into sequential recommendation. Despite its effectiveness, we argue that current self-supervised learning-based (i.e., SSL-based) sequential recommendation models have the following limitations: (1) using only a single self-supervised learning method, either contrastive self-supervised learning or generative self-supervised learning. (2) employing a simple data augmentation strategy in either the graph structure domain or the node feature domain. We believe that they have not fully utilized the capabilities of both self-supervised methods and have not sufficiently explored the advantages of combining graph augmentation schemes. As a result, they often fail to learn better item representations. In light of this, we propose a novel multi-task sequential recommendation framework named Adaptive Self-supervised Learning for sequential Recommendation (ASLRec). Specifically, our framework combines contrastive and generative self-supervised learning methods adaptively, simultaneously applying different perturbations at both the graph topology and node feature levels. This approach constructs diverse augmented graph views and employs multiple loss functions (including contrastive loss, generative loss, mask loss, and prediction loss) for joint training. By encompassing the capabilities of various methods, our model learns item representations across different augmented graph views to achieve better performance and effectively mitigate interaction noise and sparsity. In addition, we add a small proportion of random uniform noise to item representations, making the item representations more uniform and mitigating the inherent popularity bias in interaction records. We conduct extensive experiments on three publicly available benchmark datasets to evaluate our model. The results demonstrate that our approach achieves state-of-the-art performance compared to 14 other competitive methods: the hit rate (HR) improved by over 14.39%, and the normalized discounted cumulative gain (NDCG) increased by over 18.67%.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    脑电图(EEG)在诊断脑部疾病方面具有重要价值。特别是,大脑网络已获得突出,因为它们通过建立EEG信号通道之间的连接提供了更多有价值的见解。虽然大脑连接通常由通道信号相似性来描绘,缺乏确定节点特征的一致和可靠的策略。诸如EEG信号的时域和频域属性的常规节点特征证明不足以捕获广泛的EEG信息。在我们的调查中,我们介绍了一种新颖的自适应方法,利用独特的任务诱导的自监督学习技术从脑电信号中提取节点特征。通过将这些提取的节点特征与使用皮尔逊相关系数构造的基本边缘特征合并,我们表明,所提出的方法可以作为一个插件模块,可以集成到许多常见的GNN网络(例如,GCN,GraphSAGE,GAT)作为节点特征选择模块的替换。然后进行了全面的实验,以证明所提出的方法在各种脑部疾病预测任务中与其他特征选择方法相比具有一贯的优越性能和较高的通用性。比如抑郁症,精神分裂症,和帕金森病。此外,与其他节点功能相比,我们的方法通过图池和结构学习揭示了深刻的空间模式,基于派生特征,在影响各种脑疾病预测的关键脑区发光。
    Electroencephalography (EEG) has demonstrated significant value in diagnosing brain diseases. In particular, brain networks have gained prominence as they offer additional valuable insights by establishing connections between EEG signal channels. While brain connections are typically delineated by channel signal similarity, there lacks a consistent and reliable strategy for ascertaining node characteristics. Conventional node features such as temporal and frequency domain properties of EEG signals prove inadequate for capturing the extensive EEG information. In our investigation, we introduce a novel adaptive method for extracting node features from EEG signals utilizing a distinctive task-induced self-supervised learning technique. By amalgamating these extracted node features with fundamental edge features constructed using Pearson correlation coefficients, we showed that the proposed approach can function as a plug-in module that can be integrated to many common GNN networks (e.g., GCN, GraphSAGE, GAT) as a replacement of node feature selections module. Comprehensive experiments are then conducted to demonstrate the consistently superior performance and high generality of the proposed method over other feature selection methods in various of brain disorder prediction tasks, such as depression, schizophrenia, and Parkinson\'s disease. Furthermore, compared to other node features, our approach unveils profound spatial patterns through graph pooling and structural learning, shedding light on pivotal brain regions influencing various brain disorder prediction based on derived features.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    最近,基于掩蔽图像建模(MIM)的大型预训练视觉基础模型引起了前所未有的关注,并在各种任务中取得了卓越的性能。然而,MIM用于超声成像的研究仍相对未被探索,最重要的是,当前的MIM方法未能考虑自然图像和超声之间的差距,以及超声模态的内在成像特征,如高噪声信号比。在本文中,受超声中独特的高噪声信号比特性的激励,我们提出了一种专门用于超声的去模糊MIM方法,它将去模糊任务合并到预训练代理任务中。去模糊的结合有助于预训练以更好地恢复超声图像内的细微细节,这对于随后的下游分析是至关重要的。此外,我们采用多尺度分层编码器来提取局部和全局上下文线索,以提高性能,特别是在像素方面的任务,如分割。我们进行了广泛的实验,涉及28万张超声图像进行预训练,并评估了预训练模型在各种疾病诊断(结节,桥本甲状腺炎)和任务类型(分类,分段)。实验结果证明了所提出的去模糊MIM的有效性,在广泛的下游任务和数据集上实现最先进的性能。总的来说,我们的工作强调了去模糊MIM用于超声图像分析的潜力,呈现超声特定的视觉基础模型。
    Recently, large pretrained vision foundation models based on masked image modeling (MIM) have attracted unprecedented attention and achieved remarkable performance across various tasks. However, the study of MIM for ultrasound imaging remains relatively unexplored, and most importantly, current MIM approaches fail to account for the gap between natural images and ultrasound, as well as the intrinsic imaging characteristics of the ultrasound modality, such as the high noise-to-signal ratio. In this paper, motivated by the unique high noise-to-signal ratio property in ultrasound, we propose a deblurring MIM approach specialized to ultrasound, which incorporates a deblurring task into the pretraining proxy task. The incorporation of deblurring facilitates the pretraining to better recover the subtle details within ultrasound images that are vital for subsequent downstream analysis. Furthermore, we employ a multi-scale hierarchical encoder to extract both local and global contextual cues for improved performance, especially on pixel-wise tasks such as segmentation. We conduct extensive experiments involving 280,000 ultrasound images for the pretraining and evaluate the downstream transfer performance of the pretrained model on various disease diagnoses (nodule, Hashimoto\'s thyroiditis) and task types (classification, segmentation). The experimental results demonstrate the efficacy of the proposed deblurring MIM, achieving state-of-the-art performance across a wide range of downstream tasks and datasets. Overall, our work highlights the potential of deblurring MIM for ultrasound image analysis, presenting an ultrasound-specific vision foundation model.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    软组织肉瘤,与宫颈癌和食道癌的发病率相似,来自各种软组织,如平滑肌,脂肪,和纤维组织。成像中肉瘤的有效分割对于准确诊断至关重要。
    本研究收集了45例大腿软组织肉瘤患者的多模态MRI图像,总计8,640张图像。这些图像由临床医生注释以描绘肉瘤区域,创建一个全面的数据集。我们基于UNet框架开发了一种新颖的细分模型,用残差网络和注意力机制增强,以改进特定于模态的信息提取。此外,采用自监督学习策略来优化编码器的特征提取能力。
    与单模态输入相比,新模型在使用多模态MRI图像时表现出优越的分割性能。通过各种实验设置验证了模型利用创建的数据集的有效性,确认增强的能力,以表征肿瘤区域在不同的模式。
    多模态MRI图像和先进的机器学习技术在我们的模型中的集成显着改善了大腿成像中软组织肉瘤的分割。这一进步有助于临床医生更好地诊断和了解患者的病情,利用不同成像方式的优势。进一步的研究可以探索这些技术在其他类型的软组织肉瘤和其他解剖部位的应用。
    UNASSIGNED: Soft tissue sarcomas, similar in incidence to cervical and esophageal cancers, arise from various soft tissues like smooth muscle, fat, and fibrous tissue. Effective segmentation of sarcomas in imaging is crucial for accurate diagnosis.
    UNASSIGNED: This study collected multi-modal MRI images from 45 patients with thigh soft tissue sarcoma, totaling 8,640 images. These images were annotated by clinicians to delineate the sarcoma regions, creating a comprehensive dataset. We developed a novel segmentation model based on the UNet framework, enhanced with residual networks and attention mechanisms for improved modality-specific information extraction. Additionally, self-supervised learning strategies were employed to optimize feature extraction capabilities of the encoders.
    UNASSIGNED: The new model demonstrated superior segmentation performance when using multi-modal MRI images compared to single-modal inputs. The effectiveness of the model in utilizing the created dataset was validated through various experimental setups, confirming the enhanced ability to characterize tumor regions across different modalities.
    UNASSIGNED: The integration of multi-modal MRI images and advanced machine learning techniques in our model significantly improves the segmentation of soft tissue sarcomas in thigh imaging. This advancement aids clinicians in better diagnosing and understanding the patient\'s condition, leveraging the strengths of different imaging modalities. Further studies could explore the application of these techniques to other types of soft tissue sarcomas and additional anatomical sites.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    量子架构搜索(QAS)在设计用于变分量子算法(VQA)的量子电路方面显示出巨大的希望。然而,现有的QAS算法主要探索离散空间内的电路架构,本质上是低效的。在本文中,我们提出了一种基于梯度的量子架构搜索优化(GQAS),利用电路编码器,解码器,和预测器。最初,编码器将电路架构嵌入到连续的潜在表示中。随后,预测器利用这种连续的潜在表示作为输入,并输出给定架构的估计性能。然后基于预测的性能通过连续潜在空间内的梯度下降来优化潜在表示。经优化的潜在表示最终经由解码器映射回到离散架构。为了提高潜在表现的质量,我们使用自监督学习(SSL)在电路架构的大量数据集上预先训练编码器。我们在变分量子本征求解器(VQE)上的模拟结果表明,我们的方法优于当前的可微分量子体系结构搜索(DQAS)。
    Quantum Architecture Search (QAS) has shown significant promise in designing quantum circuits for Variational Quantum Algorithms (VQAs). However, existing QAS algorithms primarily explore circuit architectures within a discrete space, which is inherently inefficient. In this paper, we propose a Gradient-based Optimization for Quantum Architecture Search (GQAS), which leverages a circuit encoder, decoder, and predictor. Initially, the encoder embeds circuit architectures into a continuous latent representation. Subsequently, a predictor utilizes this continuous latent representation as input and outputs an estimated performance for the given architecture. The latent representation is then optimized through gradient descent within the continuous latent space based on the predicted performance. The optimized latent representation is finally mapped back to a discrete architecture via the decoder. To enhance the quality of the latent representation, we pre-train the encoder on a substantial dataset of circuit architectures using Self-Supervised Learning (SSL). Our simulation results on the Variational Quantum Eigensolver (VQE) indicate that our method outperforms the current Differentiable Quantum Architecture Search (DQAS).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    由于训练过程中的多视图一致性假设,自监督单目深度估计可以在静态环境中表现出出色的性能。然而,在动态场景中考虑运动物体造成的遮挡问题时,很难保持深度的一致性。出于这个原因,我们提出了一种在动态场景中进行单目深度估计的自监督自蒸馏方法(SS-MDE),其中具有多尺度解码器和轻量级姿态网络的深度网络被设计为通过视差以自监督的方式预测深度,运动信息,以及图像序列中两个相邻帧之间的关联。同时,为了提高静态区域的深度估计精度,LeReS网络生成的伪深度图像用于提供伪监督信息,增强静态区域深度细化的效果。此外,利用遗忘因素来减轻对伪监督的依赖。此外,引入了教师模型来生成深度先验信息,并设计了多视图掩码滤波模块来实现特征提取和噪声滤波。这可以使学生模型更好地学习动态场景的深层结构,以自蒸馏的方式增强了整个模型的泛化性和鲁棒性。最后,在四个公共数据数据集上,所提出的SS-MDE方法的性能优于几种最先进的单目深度估计技术,实现89%的精度(δ1),同时在NYU深度V2中将误差(AbsRel)最小化0.102,并且实现87%的精度(δ1),同时在KITTI中将误差(AbsRel)最小化0.111。
    Self-supervised monocular depth estimation can exhibit excellent performance in static environments due to the multi-view consistency assumption during the training process. However, it is hard to maintain depth consistency in dynamic scenes when considering the occlusion problem caused by moving objects. For this reason, we propose a method of self-supervised self-distillation for monocular depth estimation (SS-MDE) in dynamic scenes, where a deep network with a multi-scale decoder and a lightweight pose network are designed to predict depth in a self-supervised manner via the disparity, motion information, and the association between two adjacent frames in the image sequence. Meanwhile, in order to improve the depth estimation accuracy of static areas, the pseudo-depth images generated by the LeReS network are used to provide the pseudo-supervision information, enhancing the effect of depth refinement in static areas. Furthermore, a forgetting factor is leveraged to alleviate the dependency on the pseudo-supervision. In addition, a teacher model is introduced to generate depth prior information, and a multi-view mask filter module is designed to implement feature extraction and noise filtering. This can enable the student model to better learn the deep structure of dynamic scenes, enhancing the generalization and robustness of the entire model in a self-distillation manner. Finally, on four public data datasets, the performance of the proposed SS-MDE method outperformed several state-of-the-art monocular depth estimation techniques, achieving an accuracy (δ1) of 89% while minimizing the error (AbsRel) by 0.102 in NYU-Depth V2 and achieving an accuracy (δ1) of 87% while minimizing the error (AbsRel) by 0.111 in KITTI.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号