关键词: Artificial intelligence breast cancer detection generalization evaluation transfer learning unseen population

Mesh : Humans Breast Neoplasms / diagnostic imaging Female Case-Control Studies Artificial Intelligence Middle Aged Retrospective Studies Adult Finland Aged Transfer, Psychology Mammography / methods Breast / diagnostic imaging

来  源:   DOI:10.1177/02841851231218960

Abstract:
BACKGROUND: Some researchers have questioned whether artificial intelligence (AI) systems maintain their performance when used for women from populations not considered during the development of the system.
OBJECTIVE: To evaluate the impact of transfer learning as a way of improving the generalization of AI systems in the detection of breast cancer.
METHODS: This retrospective case-control Finnish study involved 191 women diagnosed with breast cancer and 191 matched healthy controls. We selected a state-of-the-art AI system for breast cancer detection trained using a large US dataset. The selected baseline system was evaluated in two experimental settings. First, we examined our private Finnish sample as an independent test set that had not been considered in the development of the system (unseen population). Second, the baseline system was retrained to attempt to improve its performance in the unseen population by means of transfer learning. To analyze performance, we used areas under the receiver operating characteristic curve (AUCs) with DeLong\'s test.
RESULTS: Two versions of the baseline system were considered: ImageOnly and Heatmaps. The ImageOnly and Heatmaps versions yielded mean AUC values of 0.82±0.008 and 0.88±0.003 in the US dataset and 0.56 (95% CI=0.50-0.62) and 0.72 (95% CI=0.67-0.77) when evaluated in the unseen population, respectively. The retrained systems achieved AUC values of 0.61 (95% CI=0.55-0.66) and 0.69 (95% CI=0.64-0.75), respectively. There was no statistical difference between the baseline system and the retrained system.
CONCLUSIONS: Transfer learning with a small study sample did not yield a significant improvement in the generalization of the system.
摘要:
背景:一些研究人员质疑人工智能(AI)系统在系统开发过程中未考虑的人群中用于女性时是否能保持其性能。
目的:评估迁移学习作为改善AI系统在乳腺癌检测中的泛化的一种方式的影响。
方法:这项芬兰回顾性病例对照研究涉及191名被诊断患有乳腺癌的女性和191名匹配的健康对照。我们选择了使用大型美国数据集训练的最先进的AI系统进行乳腺癌检测。在两个实验设置中评价所选择的基线系统。首先,我们将我们的私人芬兰样本作为一个独立的测试集进行了检查,该测试集在系统的开发中没有被考虑(看不见的群体).第二,对基线系统进行了再训练,试图通过迁移学习来改善其在未见人群中的表现.为了分析性能,我们使用Delong检验的接受者工作特征曲线(AUC)下的面积。
结果:考虑了两种版本的基线系统:ImageOnly和Heatmap。ImageOnly和Heatmaps版本在美国数据集中的平均AUC值为0.82±0.008和0.88±0.003,在未见过的人群中进行评估时,平均AUC值为0.56(95%CI=0.50-0.62)和0.72(95%CI=0.67-0.77),分别。经重新训练的系统的AUC值为0.61(95%CI=0.55-0.66)和0.69(95%CI=0.64-0.75),分别。基线系统和再训练系统之间没有统计学差异。
结论:小样本的迁移学习在系统的推广方面没有显著的改善。
公众号