Self-supervised learning

  • 文章类型: Journal Article
    Self-supervised monocular depth estimation can exhibit excellent performance in static environments due to the multi-view consistency assumption during the training process. However, it is hard to maintain depth consistency in dynamic scenes when considering the occlusion problem caused by moving objects. For this reason, we propose a method of self-supervised self-distillation for monocular depth estimation (SS-MDE) in dynamic scenes, where a deep network with a multi-scale decoder and a lightweight pose network are designed to predict depth in a self-supervised manner via the disparity, motion information, and the association between two adjacent frames in the image sequence. Meanwhile, in order to improve the depth estimation accuracy of static areas, the pseudo-depth images generated by the LeReS network are used to provide the pseudo-supervision information, enhancing the effect of depth refinement in static areas. Furthermore, a forgetting factor is leveraged to alleviate the dependency on the pseudo-supervision. In addition, a teacher model is introduced to generate depth prior information, and a multi-view mask filter module is designed to implement feature extraction and noise filtering. This can enable the student model to better learn the deep structure of dynamic scenes, enhancing the generalization and robustness of the entire model in a self-distillation manner. Finally, on four public data datasets, the performance of the proposed SS-MDE method outperformed several state-of-the-art monocular depth estimation techniques, achieving an accuracy (δ1) of 89% while minimizing the error (AbsRel) by 0.102 in NYU-Depth V2 and achieving an accuracy (δ1) of 87% while minimizing the error (AbsRel) by 0.111 in KITTI.






  • 文章类型: Journal Article
    Computed tomography (CT) denoising is a challenging task in medical imaging that has garnered considerable attention. Supervised networks require a lot of noisy-clean image pairs, which are always unavailable in clinical settings. Existing self-supervised algorithms for suppressing noise with paired noisy images have limitations, such as ignoring the residual between similar image pairs during training and insufficiently learning the spectrum information of images. In this study, we propose a Residual Image Prior Network (RIP-Net) to sufficiently model the residual between the paired similar noisy images. Our approach offers new insights into the field by addressing the limitations of existing methods. We first establish a mathematical theorem clarifying the non-equivalence between similar-image-based self-supervised learning and supervised learning. It helps us better understand the strengths and limitations of self-supervised learning. Secondly, we introduce a novel regularization term to model a low-frequency residual image prior. This can improve the accuracy and robustness of our model. Finally, we design a well-structured denoising network capable of exploring spectrum information while simultaneously sensing context messages. The network has dual paths for modeling high and low-frequency compositions in the raw noisy image. Additionally, context perception modules capture local and global interactions to produce high-quality images. The comprehensive experiments on preclinical photon-counting CT, clinical brain CT, and low-dose CT datasets, demonstrate that our RIP-Net is superior to other unsupervised denoising methods.






  • DOI:
    文章类型: Journal Article
    Although human\'s ability to visually understand the structure of the World plays a crucial role in perceiving the World and making appropriate decisions, human perception does not solely rely on vision but amalgamates the information from acoustic, verbal, and visual stimuli. An active area of research has been revolving around designing an efficient framework that adapts to multiple modalities and ideally improves the performance of existing tasks. While numerous frameworks have proved effective on natural datasets like ImageNet, a limited number of studies have been carried out in the biomedical domain. In this work, we extend the available frameworks for natural data to biomedical data by leveraging the abundant, unstructured multi-modal data available as radiology images and reports. We attempt to answer the question, \"For multi-modal learning, self-supervised learning and joint learning using both learning strategies, which one improves the visual representation for downstream chest radiographs classification tasks the most?\". Our experiments indicated that in limited labeled data settings with 1% and 10% labeled data, the joint learning with multi-modal and self-supervised models outperforms self-supervised learning and is at par with multi-modal learning. Additionally, we found that multi-modal learning is generally more robust on out-of-distribution datasets. The code is publicly available online.






  • 文章类型: Journal Article
    Retinal image registration is of utmost importance due to its wide applications in medical practice. In this context, we propose ConKeD, a novel deep learning approach to learn descriptors for retinal image registration. In contrast to current registration methods, our approach employs a novel multi-positive multi-negative contrastive learning strategy that enables the utilization of additional information from the available training samples. This makes it possible to learn high-quality descriptors from limited training data. To train and evaluate ConKeD, we combine these descriptors with domain-specific keypoints, particularly blood vessel bifurcations and crossovers, that are detected using a deep neural network. Our experimental results demonstrate the benefits of the novel multi-positive multi-negative strategy, as it outperforms the widely used triplet loss technique (single-positive and single-negative) as well as the single-positive multi-negative alternative. Additionally, the combination of ConKeD with the domain-specific keypoints produces comparable results to the state-of-the-art methods for retinal image registration, while offering important advantages such as avoiding pre-processing, utilizing fewer training samples, and requiring fewer detected keypoints, among others. Therefore, ConKeD shows a promising potential towards facilitating the development and application of deep learning-based methods for retinal image registration.






  • 文章类型: Journal Article
    OBJECTIVE: Concerns about patient privacy issues have limited the application of medical deep learning models in certain real-world scenarios. Differential privacy (DP) can alleviate this problem by injecting random noise into the model. However, naively applying DP to medical models will not achieve a satisfactory balance between privacy and utility due to the high dimensionality of medical models and the limited labeled samples.
    METHODS: This work proposed the DP-SSLoRA model, a privacy-preserving classification model for medical images combining differential privacy with self-supervised low-rank adaptation. In this work, a self-supervised pre-training method is used to obtain enhanced representations from unlabeled publicly available medical data. Then, a low-rank decomposition method is employed to mitigate the impact of differentially private noise and combined with pre-trained features to conduct the classification task on private datasets.
    RESULTS: In the classification experiments using three real chest-X ray datasets, DP-SSLoRA achieves good performance with strong privacy guarantees. Under the premise of ɛ=2, with the AUC of 0.942 in RSNA, the AUC of 0.9658 in Covid-QU-mini, and the AUC of 0.9886 in Chest X-ray 15k.
    CONCLUSIONS: Extensive experiments on real chest X-ray datasets show that DP-SSLoRA can achieve satisfactory performance with stronger privacy guarantees. This study provides guidance for studying privacy-preserving in the medical field. Source code is publicly available online.






  • 文章类型: Journal Article
    Single-cell RNA-sequencing (scRNA-seq) enables the investigation of intricate mechanisms governing cell heterogeneity and diversity. Clustering analysis remains a pivotal tool in scRNA-seq for discerning cell types. However, persistent challenges arise from noise, high dimensionality, and dropout in single-cell data. Despite the proliferation of scRNA-seq clustering methods, these often focus on extracting representations from individual cell expression data, neglecting potential intercellular relationships. To overcome this limitation, we introduce scGAAC, a novel clustering method based on an attention-based graph convolutional autoencoder. By leveraging structural information between cells through a graph attention autoencoder, scGAAC uncovers latent relationships while extracting representation information from single-cell gene expression patterns. An attention fusion module amalgamates the learned features of the graph attention autoencoder and the autoencoder through attention weights. Ultimately, a self-supervised learning policy guides model optimization. scGAAC, a hypothesis-free framework, performs better on four real scRNA-seq datasets than most state-of-the-art methods. The scGAAC implementation is publicly available on Github at:






  • 文章类型: Journal Article
    Due to the difficulty in obtaining clinical samples and the high cost of labeling, rare skin diseases are characterized by data scarcity, making training deep neural networks for classification challenging. In recent years, few-shot learning has emerged as a promising solution, enabling models to recognize unseen disease classes by limited labeled samples. However, most existing methods ignored the fine-grained nature of rare skin diseases, resulting in poor performance when generalizing to highly similar classes. Moreover, the distributions learned from limited labeled data are biased, severely impairing the model\'s generalizability. This paper proposes a self-supervision distribution calibration network (SS-DCN) to address the above issues. Specifically, SS-DCN adopts a multi-task learning framework during pre-training. By introducing self-supervised tasks to aid in supervised learning, the model can learn more discriminative and transferable visual representations. Furthermore, SS-DCN applied an enhanced distribution calibration (EDC) strategy, which utilizes the statistics of base classes with sufficient samples to calibrate the bias distribution of novel classes with few-shot samples. By generating more samples from the calibrated distribution, EDC can provide sufficient supervision for subsequent classifier training. The proposed method is evaluated on three public skin disease datasets(i.e., ISIC2018, Derm7pt, and SD198), achieving significant performance improvements over state-of-the-art methods.






  • 文章类型: Journal Article
    Diabetes, characterized by heightened blood sugar levels, can lead to a condition called Diabetic Retinopathy (DR), which adversely impacts the eyes due to elevated blood sugar affecting the retinal blood vessels. The most common cause of blindness in diabetics is thought to be Diabetic Retinopathy (DR), particularly in working-age individuals living in poor nations. People with type 1 or type 2 diabetes may develop this illness, and the risk rises with the length of diabetes and inadequate blood sugar management. There are limits to traditional approaches for the early identification of diabetic retinopathy (DR). In order to diagnose diabetic retinopathy, a model based on Convolutional neural network (CNN) is used in a unique way in this research. The suggested model uses a number of deep learning (DL) models, such as VGG19, Resnet50, and InceptionV3, to extract features. After concatenation, these characteristics are sent through the CNN algorithm for classification. By combining the advantages of several models, ensemble approaches can be effective tools for detecting diabetic retinopathy and increase overall performance and resilience. Classification and image recognition are just a few of the tasks that may be accomplished with ensemble approaches like combination of VGG19,Inception V3 and Resnet 50 to achieve high accuracy. The proposed model is evaluated using a publicly accessible collection of fundus images.VGG19, ResNet50, and InceptionV3 differ in their neural network architectures, feature extraction capabilities, object detection methods, and approaches to retinal delineation. VGG19 may excel in capturing fine details, ResNet50 in recognizing complex patterns, and InceptionV3 in efficiently capturing multi-scale features. Their combined use in an ensemble approach can provide a comprehensive analysis of retinal images, aiding in the delineation of retinal regions and identification of abnormalities associated with diabetic retinopathy. For instance, micro aneurysms, the earliest signs of DR, often require precise detection of subtle vascular abnormalities. VGG19\'s proficiency in capturing fine details allows for the identification of these minute changes in retinal morphology. On the other hand, ResNet50\'s strength lies in recognizing intricate patterns, making it effective in detecting neoneovascularization and complex haemorrhagic lesions. Meanwhile, InceptionV3\'s multi-scale feature extraction enables comprehensive analysis, crucial for assessing macular oedema and ischaemic changes across different retinal layers.






  • 文章类型: Journal Article
    Time series is a typical data type in numerous domains; however, labeling large amounts of time series data can be costly and time-consuming. Learning effective representation from unlabeled time series data is a challenging task. Contrastive learning stands out as a promising method to acquire representations of unlabeled time series data. Therefore, we propose a self-supervised time-series representation learning framework via Time-Frequency Fusion Contrasting (TF-FC) to learn time-series representation from unlabeled data. Specifically, TF-FC combines time-domain augmentation with frequency-domain augmentation to generate the diverse samples. For time-domain augmentation, the raw time series data pass through the time-domain augmentation bank (such as jitter, scaling, permutation, and masking) and get time-domain augmentation data. For frequency-domain augmentation, first, the raw time series undergoes conversion into frequency domain data following Fast Fourier Transform (FFT) analysis. Then, the frequency data passes through the frequency-domain augmentation bank (such as low pass filter, remove frequency, add frequency, and phase shift) and gets frequency-domain augmentation data. The fusion method of time-domain augmentation data and frequency-domain augmentation data is kernel PCA, which is useful for extracting nonlinear features in high-dimensional spaces. By capturing both the time and frequency domains of the time series, the proposed approach is able to extract more informative features from the data, enhancing the model\'s capacity to distinguish between different time series. To verify the effectiveness of the TF-FC method, we conducted experiments on four time series domain datasets (i.e., SleepEEG, HAR, Gesture, and Epilepsy). Experimental results show that TF-FC significantly improves in recognition accuracy compared with other SOTA methods.






  • 文章类型: Journal Article
    Efficient multi-modal image fusion plays an important role in the non-destructive evaluation (NDE) of infrastructures, where an essential challenge is the precise visualizing of defects. While automatic defect detection represents a significant advancement, the determination of the precise location of both surface and subsurface defects simultaneously is crucial. Hence, visible and infrared data fusion strategies are essential for acquiring comprehensive and complementary information to detect defects across vast structures. This paper proposes an infrared and visible image registration method based on Euclidean evaluation together with a trade-off between key-point threshold and non-maximum suppression. Moreover, we employ a multi-modal fusion strategy to investigate the robustness of our image registration results.





