关键词: Chest X-rays Deep learning Informative sample selection Semantic redundancy Statistical significance

Mesh : Deep Learning Humans Semantics Radiography, Thoracic

来  源:   DOI:10.1016/j.compmedimag.2024.102379   PDF(Pubmed)

Abstract:
Deep learning (DL) has demonstrated its innate capacity to independently learn hierarchical features from complex and multi-dimensional data. A common understanding is that its performance scales up with the amount of training data. However, the data must also exhibit variety to enable improved learning. In medical imaging data, semantic redundancy, which is the presence of similar or repetitive information, can occur due to the presence of multiple images that have highly similar presentations for the disease of interest. Also, the common use of augmentation methods to generate variety in DL training could limit performance when indiscriminately applied to such data. We hypothesize that semantic redundancy would therefore tend to lower performance and limit generalizability to unseen data and question its impact on classifier performance even with large data. We propose an entropy-based sample scoring approach to identify and remove semantically redundant training data and demonstrate using the publicly available NIH chest X-ray dataset that the model trained on the resulting informative subset of training data significantly outperforms the model trained on the full training set, during both internal (recall: 0.7164 vs 0.6597, p<0.05) and external testing (recall: 0.3185 vs 0.2589, p<0.05). Our findings emphasize the importance of information-oriented training sample selection as opposed to the conventional practice of using all available training data.
摘要:
深度学习(DL)已经证明了其从复杂和多维数据中独立学习分层特征的固有能力。一个共同的理解是,它的性能随着训练数据量的增加而扩展。然而,数据还必须表现出多样性,以提高学习能力。在医学成像数据中,语义冗余,即存在类似或重复的信息,可能由于存在多个图像而发生,这些图像对于感兴趣的疾病具有高度相似的呈现。此外,当不加区别地应用于此类数据时,通常使用增强方法来生成DL训练中的多样性可能会限制性能。因此,我们假设语义冗余会降低性能,并限制对看不见的数据的可泛化性,并质疑其对分类器性能的影响,即使是大数据。我们提出了一种基于熵的样本评分方法来识别和去除语义冗余的训练数据,并使用公开的NIH胸部X射线数据集证明,在训练数据的结果信息子集上训练的模型明显优于在完整训练集上训练的模型,在内部(召回:0.7164vs0.6597,p<0.05)和外部测试(召回:0.3185vs0.2589,p<0.05)。我们的发现强调了以信息为导向的训练样本选择的重要性,而不是使用所有可用训练数据的常规做法。
公众号