背景:放射学报告通常以自由文本格式编写,使得临床信息难以提取和使用。最近,由于结构化报告(SR)提供的优势,各种医学协会都建议采用结构化报告(SR),例如标准化,完整性,和信息检索。我们提出了一条管道,以从意大利的自由文本放射学报告中提取信息,这些信息符合国家介入和医学放射学协会提出的参考SR注册项目。重点关注淋巴瘤患者的CT分期。
方法:我们的工作旨在利用自然语言处理和基于Transformer的模型的潜力来处理自动SR注册表填充。有了174份意大利放射学报告,我们研究了一种基于意大利语版本T5:IT5的无规则生成问答方法。为了解决信息内容差异,我们关注报告注释中最常填写的六个项目:三个分类(多选),一个自由文本(自由文本),和两个连续的数值(事实)。在预处理阶段,我们也编码不应该输入的信息。实施两种策略(分批截断和事后组合)以符合IT5上下文长度限制。根据严格的准确性来评估性能,F1和格式准确性,并与广泛使用的GPT-3.5大型语言模型进行了比较。与多项选择和事实不同,自由文本答案与参考注释没有1对1的对应关系。出于这个原因,我们收集有关医学注释和生成的自由文本答案之间相似性的专家反馈,使用5点Likert量表问卷(评估正确性和完整性的标准)。
结果:微调和批量拆分的结合,使IT5事后组合在不同类型结构化数据的信息提取方面取得了显著成果,表现与GPT-3.5相当。自由文本答案的基于人的评估得分显示,IT5事后组合和GPT-3.5与AI性能指标f1(Spearman相关系数>0.5,p值<0.001)高度相关。后者更擅长产生似是而非的类似人类的陈述,即使它系统地提供了答案,即使它们不应该被给出。
结论:在我们的实验环境中,具有适度数量参数的基于变压器的微调模型(即,IT5,220M)作为自动SR注册表填充任务的临床信息提取系统表现良好。它可以从报告中的多个位置提取信息,以符合SR注册表提供的响应规范的方式进行阐述(对于多项选择和事实项目),或者非常接近人类专家的工作(自由文本项目);具有辨别何时应该向用户查询给出答案的能力。
BACKGROUND: Radiology reports are typically written in a free-text format, making clinical information difficult to extract and use. Recently, the adoption of structured reporting (SR) has been recommended by various medical societies thanks to the advantages it offers, e.g. standardization, completeness, and information retrieval. We propose a pipeline to extract information from Italian free-text
radiology reports that fits with the items of the reference SR registry proposed by a national society of interventional and medical
radiology, focusing on CT staging of patients with lymphoma.
METHODS: Our work aims to leverage the potential of Natural Language Processing and Transformer-based models to deal with automatic SR registry filling. With the availability of 174 Italian
radiology reports, we investigate a rule-free generative Question Answering approach based on the Italian-specific version of T5: IT5. To address information content discrepancies, we focus on the six most frequently filled items in the annotations made on the reports: three categorical (multichoice), one free-text (free-text), and two continuous numerical (factual). In the preprocessing phase, we encode also information that is not supposed to be entered. Two strategies (batch-truncation and ex-post combination) are implemented to comply with the IT5 context length limitations. Performance is evaluated in terms of strict accuracy, f1, and format accuracy, and compared with the widely used GPT-3.5 Large Language Model. Unlike multichoice and factual, free-text answers do not have 1-to-1 correspondence with their reference annotations. For this reason, we collect human-expert feedback on the similarity between medical annotations and generated free-text answers, using a 5-point Likert scale questionnaire (evaluating the criteria of correctness and completeness).
RESULTS: The combination of fine-tuning and batch splitting allows IT5 ex-post combination to achieve notable results in terms of information extraction of different types of structured data, performing on par with GPT-3.5. Human-based assessment scores of free-text answers show a high correlation with the AI performance metrics f1 (Spearman\'s correlation coefficients>0.5, p-values<0.001) for both IT5 ex-post combination and GPT-3.5. The latter is better at generating plausible human-like statements, even if it systematically provides answers even when they are not supposed to be given.
CONCLUSIONS: In our experimental setting, a fine-tuned Transformer-based model with a modest number of parameters (i.e., IT5, 220 M) performs well as a clinical information extraction system for automatic SR registry filling task. It can extract information from more than one place in the report, elaborating it in a manner that complies with the response specifications provided by the SR registry (for multichoice and factual items), or that closely approximates the work of a human-expert (free-text items); with the ability to discern when an answer is supposed to be given or not to a user query.