关键词: feature extraction label noise learning with noisy labels medical image classification self-supervised pretraining warm-up obstacle

来  源:   DOI:10.1007/978-3-031-44992-5_8   PDF(Pubmed)

Abstract:
Noisy labels hurt deep learning-based supervised image classification performance as the models may overfit the noise and learn corrupted feature extractors. For natural image classification training with noisy labeled data, model initialization with contrastive self-supervised pretrained weights has shown to reduce feature corruption and improve classification performance. However, no works have explored: i) how other self-supervised approaches, such as pretext task-based pretraining, impact the learning with noisy label, and ii) any self-supervised pretraining methods alone for medical images in noisy label settings. Medical images often feature smaller datasets and subtle inter-class variations, requiring human expertise to ensure correct classification. Thus, it is not clear if the methods improving learning with noisy labels in natural image datasets such as CIFAR would also help with medical images. In this work, we explore contrastive and pretext task-based selfsupervised pretraining to initialize the weights of a deep learning classification model for two medical datasets with self-induced noisy labels-NCT-CRC-HE-100K tissue histological images and COVID-QU-Ex chest X-ray images. Our results show that models initialized with pretrained weights obtained from self-supervised learning can effectively learn better features and improve robustness against noisy labels.
摘要:
噪声标签会损害基于深度学习的监督图像分类性能,因为模型可能会过度拟合噪声并学习损坏的特征提取器。对于具有噪声标记数据的自然图像分类训练,具有对比自监督预训练权重的模型初始化已显示出减少特征损坏并提高分类性能。然而,没有作品探索:i)其他自我监督方法,例如借口基于任务的预训练,用嘈杂的标签影响学习,和ii)在嘈杂的标签设置中,仅针对医学图像的任何自我监督预训练方法。医学图像通常具有较小的数据集和微妙的类间变化,需要人类的专业知识,以确保正确的分类。因此,目前尚不清楚在诸如CIFAR之类的自然图像数据集中使用嘈杂标签改善学习的方法是否也有助于医学图像。在这项工作中,我们探索了对比和借口的基于任务的自监督预训练,以初始化具有自诱导噪声标签的两个医学数据集的深度学习分类模型的权重-NCT-CRC-HE-100K组织组织学图像和COVID-QU-Ex胸部X线图像.我们的结果表明,使用自监督学习获得的预训练权重初始化的模型可以有效地学习更好的特征,并提高对噪声标签的鲁棒性。
公众号