基于深度迁移学习的自然语言处理系列自由文本计算机断层扫描报告预测胰腺癌患者的生存。Deep-Transfer-Learning-Based Natural Language Processing of Serial Free-Text Computed Tomography Reports for Predicting Survival of Patients With Pancreatic Cancer.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

OBJECTIVE: To explore the predictive potential of serial computed tomography (CT) radiology reports for pancreatic cancer survival using natural language processing (NLP).
METHODS: Deep-transfer-learning-based NLP models were retrospectively trained and tested with serial, free-text CT reports, and survival information of consecutive patients diagnosed with pancreatic cancer in a Korean tertiary hospital was extracted. Randomly selected patients with pancreatic cancer and their serial CT reports from an independent tertiary hospital in the United States were included in the external testing data set. The concordance index (c-index) of predicted survival and actual survival, and area under the receiver operating characteristic curve (AUROC) for predicting 1-year survival were calculated.
RESULTS: Between January 2004 and June 2021, 2,677 patients with 12,255 CT reports and 670 patients with 3,058 CT reports were allocated to training and internal testing data sets, respectively. ClinicalBERT (Bidirectional Encoder Representations from Transformers) model trained on the single, first CT reports showed a c-index of 0.653 and AUROC of 0.722 in predicting the overall survival of patients with pancreatic cancer. ClinicalBERT trained on up to 15 consecutive reports from the initial report showed an improved c-index of 0.811 and AUROC of 0.911. On the external testing set with 273 patients with 1,947 CT reports, the AUROC was 0.888, indicating the generalizability of our model. Further analyses showed our model\'s contextual interpretation beyond specific phrases.
CONCLUSIONS: Deep-transfer-learning-based NLP model of serial CT reports can predict the survival of patients with pancreatic cancer. Clinical decisions can be supported by the developed model, with survival information extracted solely from serial radiology reports.

摘要：

目的：探讨使用自然语言处理（NLP）的连续计算机断层扫描（CT）放射学报告对胰腺癌生存的预测潜力。
方法：基于深度迁移学习的NLP模型进行了回顾性训练和测试，免费CT报告，并提取韩国某三级医院连续诊断为胰腺癌患者的生存信息。随机选择的胰腺癌患者及其来自美国独立三级医院的系列CT报告被纳入外部测试数据集。预测生存率和实际生存率的一致性指数(c指数)，计算预测1年生存率的受试者工作特征曲线下面积(AUROC)。
结果：在2004年1月至2021年6月之间，将2,677例患者和12,255例CT报告和670例患者和3,058例CT报告分配到培训和内部测试数据集，分别。ClinicalBERT（来自变压器的双向编码器表示）模型在单个模型上训练，首次CT报告显示,预测胰腺癌患者总生存期的c指数为0.653,AUROC为0.722.ClinicalBERT对最初报告的15份连续报告进行了培训，显示c指数为0.811，AUROC为0.911。在273例患者的外部测试装置上，有1,947例CT报告，AUROC为0.888，表明我们的模型具有普适性。进一步的分析表明，我们的模型的上下文解释超出了特定的短语。
结论：基于深度迁移学习的NLP模型可以预测胰腺癌患者的生存率。临床决策可以由开发的模型支持，仅从连续放射学报告中提取生存信息。