基于 BERT 的迁移学习在自由文本放射学报告句子级解剖分类中的应用 [J].BERT-based Transfer Learning in Sentence-level Anatomic Classification of Free-Text Radiology Reports.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

UNASSIGNED: To assess whether transfer learning with a bidirectional encoder representations from transformers (BERT) model, pretrained on a clinical corpus, can perform sentence-level anatomic classification of free-text radiology reports, even for anatomic classes with few positive examples.
UNASSIGNED: This retrospective study included radiology reports of patients who underwent whole-body PET/CT imaging from December 2005 to December 2020. Each sentence in these reports (6272 sentences) was labeled by two annotators according to body part (\"brain,\" \"head & neck,\" \"chest,\" \"abdomen,\" \"limbs,\" \"spine,\" or \"others\"). The BERT-based transfer learning approach was compared with two baseline machine learning approaches: bidirectional long short-term memory (BiLSTM) and the count-based method. Area under the precision-recall curve (AUPRC) and area under the receiver operating characteristic curve (AUC) were computed for each approach, and AUCs were compared using the DeLong test.
UNASSIGNED: The BERT-based approach achieved a macro-averaged AUPRC of 0.88 for classification, outperforming the baselines. AUC results for BERT were significantly higher than those of BiLSTM for all classes and those of the count-based method for the \"brain,\" \"chest,\" \"abdomen,\" and \"others\" classes (P values < .025). AUPRC results for BERT were superior to those of baselines even for classes with few labeled training data (brain: BERT, 0.95, BiLSTM, 0.11, count based, 0.41; limbs: BERT, 0.74, BiLSTM, 0.28, count based, 0.46; spine: BERT, 0.82, BiLSTM, 0.53, count based, 0.69).
UNASSIGNED: The BERT-based transfer learning approach outperformed the BiLSTM and count-based approaches in sentence-level anatomic classification of free-text radiology reports, even for anatomic classes with few labeled training data.Keywords: Anatomy, Comparative Studies, Technology Assessment, Transfer Learning Supplemental material is available for this article. © RSNA, 2023.

摘要：

■要评估是否使用来自变压器（BERT）模型的双向编码器表示进行迁移学习，在临床语料库上预先训练，可以对自由文本放射学报告进行句子级解剖分类，即使是很少有积极例子的解剖类。
这项回顾性研究包括2005年12月至2020年12月接受全身PET/CT成像的患者的放射学报告。这些报告中的每个句子（6272个句子）由两个注释者根据身体部位（“大脑，\"\"头颈,\"\"胸部,“\”腹部，\"\"四肢,\"\"脊柱,\"或\"其他\")。将基于BERT的迁移学习方法与两种基线机器学习方法进行了比较：双向长短期记忆（BiLSTM）和基于计数的方法。计算每种方法的精确度-召回曲线下面积(AUPRC)和接受者工作特征曲线下面积(AUC),和AUC使用DeLong检验进行比较。
■基于BERT的方法实现了0.88的宏观平均AUPRC的分类，表现优于基线。BERT的AUC结果显着高于所有类别的BiLSTM和基于计数的大脑方法的AUC结果，\"\"胸部,“\”腹部，\"和\"其他\"类(P值<.025)。BERT的AUPRC结果优于基线，即使是标签训练数据很少的类(brain：BERT，0.95,BiLSTM,0.11，基于计数，0.41;四肢：BERT，0.74,BiLSTM,0.28，基于计数，0.46；脊柱：BERT，0.82，BiLSTM，0.53，基于计数，0.69)。
■在自由文本放射学报告的句子级解剖分类中，基于BERT的迁移学习方法优于BiLSTM和基于计数的方法，即使是很少标记训练数据的解剖类。关键词:解剖学,比较研究，技术评估,迁移学习补充材料可用于本文。©RSNA,2023年。