关键词: Multi-Modal Learning Out-of-Distribution Radiology Self-Supervised Learning

来  源:   DOI:   PDF(Pubmed)

Abstract:
Although human\'s ability to visually understand the structure of the World plays a crucial role in perceiving the World and making appropriate decisions, human perception does not solely rely on vision but amalgamates the information from acoustic, verbal, and visual stimuli. An active area of research has been revolving around designing an efficient framework that adapts to multiple modalities and ideally improves the performance of existing tasks. While numerous frameworks have proved effective on natural datasets like ImageNet, a limited number of studies have been carried out in the biomedical domain. In this work, we extend the available frameworks for natural data to biomedical data by leveraging the abundant, unstructured multi-modal data available as radiology images and reports. We attempt to answer the question, \"For multi-modal learning, self-supervised learning and joint learning using both learning strategies, which one improves the visual representation for downstream chest radiographs classification tasks the most?\". Our experiments indicated that in limited labeled data settings with 1% and 10% labeled data, the joint learning with multi-modal and self-supervised models outperforms self-supervised learning and is at par with multi-modal learning. Additionally, we found that multi-modal learning is generally more robust on out-of-distribution datasets. The code is publicly available online.
摘要:
尽管人类视觉理解世界结构的能力在感知世界和做出适当的决定中起着至关重要的作用。人类的感知不仅依赖于视觉,而且融合了来自声学的信息,口头,和视觉刺激。一个活跃的研究领域一直围绕着设计一个有效的框架,该框架可以适应多种模式,并理想地提高现有任务的性能。虽然许多框架已经证明在像ImageNet这样的自然数据集上是有效的,在生物医学领域进行了数量有限的研究。在这项工作中,我们通过利用丰富的资源,将自然数据的可用框架扩展到生物医学数据,非结构化多模态数据可作为放射学图像和报告。我们试图回答这个问题,“对于多模态学习,使用两种学习策略进行自我监督学习和联合学习,哪一个最能改善下游胸片分类任务的视觉表示?\"我们的实验表明,在具有1%和10%标记数据的有限标记数据设置中,多模态和自监督模型的联合学习优于自监督学习,与多模态学习相当。此外,我们发现,多模态学习在分布外的数据集上通常更健壮。该代码可在线公开获得。
公众号