开源大型语言模型在从自由文本放射学报告中提取信息中的性能。Performance of an Open-Source Large Language Model in Extracting Information from Free-Text Radiology Reports.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

Purpose To assess the performance of a local open-source large language model (LLM) in various information extraction tasks from real-life emergency brain MRI reports. Materials and Methods All consecutive emergency brain MRI reports written in 2022 from a French quaternary center were retrospectively reviewed. Two radiologists identified MRI scans that were performed in the emergency department for headaches. Four radiologists scored the reports\' conclusions as either normal or abnormal. Abnormalities were labeled as either headache-causing or incidental. Vicuna (LMSYS Org), an open-source LLM, performed the same tasks. Vicuna\'s performance metrics were evaluated using the radiologists\' consensus as the reference standard. Results Among the 2398 reports during the study period, radiologists identified 595 that included headaches in the indication (median age of patients, 35 years [IQR, 26-51 years]; 68% [403 of 595] women). A positive finding was reported in 227 of 595 (38%) cases, 136 of which could explain the headache. The LLM had a sensitivity of 98.0% (95% CI: 96.5, 99.0) and specificity of 99.3% (95% CI: 98.8, 99.7) for detecting the presence of headache in the clinical context, a sensitivity of 99.4% (95% CI: 98.3, 99.9) and specificity of 98.6% (95% CI: 92.2, 100.0) for the use of contrast medium injection, a sensitivity of 96.0% (95% CI: 92.5, 98.2) and specificity of 98.9% (95% CI: 97.2, 99.7) for study categorization as either normal or abnormal, and a sensitivity of 88.2% (95% CI: 81.6, 93.1) and specificity of 73% (95% CI: 62, 81) for causal inference between MRI findings and headache. Conclusion An open-source LLM was able to extract information from free-text radiology reports with excellent accuracy without requiring further training. Keywords: Large Language Model (LLM), Generative Pretrained Transformers (GPT), Open Source, Information Extraction, Report, Brain, MRI Supplemental material is available for this article. Published under a CC BY 4.0 license. See also the commentary by Akinci D\'Antonoli and Bluethgen in this issue.

摘要：

“刚刚接受”的论文经过了全面的同行评审，并已被接受发表在放射学：人工智能。本文将进行文案编辑,布局，并在最终版本发布之前进行验证审查。请注意，在制作最终的文案文章期间，可能会发现可能影响内容的错误。目的评估本地开源大型语言模型（LLM）对现实生活中的急诊脑MRI报告中各种信息提取任务的性能。材料与方法回顾性分析了法国第四纪中心2022年所有连续的急诊脑MRI报告。两名放射科医生确定了针对头痛进行的MRI。四名放射科医生将报告的结论评分为正常或异常。异常被标记为引起头痛或偶然的。维库纳,开源LLM，执行相同的任务。使用放射科医师的共识作为参考标准来评估Vicuna的性能指标。结果在研究期间的2398例报告中，放射科医生确定了595例，其中包括头痛（患者的中位年龄，35年[IQR，26-51],68%(403/595)女性)。在227/595（38％）病例中报告了阳性发现，其中136可以解释头痛。LLM具有敏感性/特异性（95CI），分别,98％（583/595）（97-99）/99％（1791/1803）（99-100）用于检测临床中头痛的存在，99％（514/517）（98-100）/99％（68/69）（92-100）使用造影剂注射，97％（219/227）（93-99）/99％（364/368）（97-100）用于研究分类为正常或异常，88％（120/136）（82-93）/73％（66/91）（62-81）用于MRI发现和头痛之间的因果关系推断。结论开源LLM能够从自由文本放射学报告中提取信息，具有出色的准确性，而无需进一步培训。©RSNA,2024.