关键词: Artificial intelligence ChatGPT Large language models Musculoskeletal Patient education Report

Mesh : Humans Radiology Information Systems Musculoskeletal Diseases / diagnostic imaging Feasibility Studies Translating Comprehension

来  源:   DOI:10.1007/s00256-024-04599-2

Abstract:
OBJECTIVE: To assess the feasibility of using large language models (LLMs), specifically ChatGPT-4, to generate concise and accurate layperson summaries of musculoskeletal radiology reports.
METHODS: Sixty radiology reports, comprising 20 MR shoulder, 20 MR knee, and 20 MR lumbar spine reports, were obtained via PACS. The reports were deidentified and then submitted to ChatGPT-4, with the prompt \"Produce an organized and concise layperson summary of the findings of the following radiology report. Target a reading level of 8-9th grade and word count <300 words.\" Three (two primary and one later added for validation) independent readers evaluated the summaries for completeness and accuracy compared to the original reports. Summaries were rated on a scale of 1 to 3: 1) summaries that were incorrect or incomplete, potentially providing harmful or confusing information; 2) summaries that were mostly correct and complete, unlikely to cause confusion or harm; and 3) summaries that were entirely correct and complete.
RESULTS: All 60 responses met the criteria for word count and readability. Mean ratings for accuracy were 2.58 for reader 1, 2.71 for reader 2, and 2.77 for reader 3. Mean ratings for completeness were 2.87 for reader 1 and 2.73 for reader 2 and 2.87 for reader 3. For accuracy, reader 1 identified three summaries as a 1, reader 2 identified one, and reader 3 identified none. For the two primary readers, inter-reader agreement was low for accuracy (kappa 0.33) and completeness (kappa 0.29). There were no statistically significant changes in inter-reader agreement when the third reader\'s ratings were included in analysis.
CONCLUSIONS: Overall ratings for accuracy and completeness of the AI-generated layperson report summaries were high with only a small minority likely to be confusing or inaccurate. This study illustrates the potential for leveraging generative AI, such as ChatGPT-4, to automate the production of patient-friendly summaries for musculoskeletal MR imaging.
摘要:
目的:为了评估使用大型语言模型(LLM)的可行性,特别是ChatGPT-4,以生成肌肉骨骼放射学报告的简明和准确的外行摘要。
方法:60份放射学报告,包括20MR肩部,20MR膝盖,和20例腰椎MR报告,是通过PACS获得的。报告被取消识别,然后提交给ChatGPT-4,并提示“对以下放射学报告的结果进行有组织和简明的外行摘要。目标阅读水平为8-9年级,字数<300字。与原始报告相比,三名(两名主要读者和一名后来添加用于验证)独立读者评估了摘要的完整性和准确性。摘要的等级为1至3:1)不正确或不完整的摘要,可能提供有害或令人困惑的信息;2)大部分正确和完整的摘要,不太可能造成混乱或伤害;3)完全正确和完整的摘要。
结果:所有60个回答都符合字数和可读性的标准。读者1的平均准确度等级为2.58,读者2的平均准确度等级为2.71,读者3的平均准确度等级为2.77。完整性的平均等级为读者1的2.87和读者2的2.73和读者3的2.87。为了准确,读者1识别出三个摘要为1,读者2识别出一个,读者3没有发现。对于两个主要读者来说,读者间的一致性在准确性(kappa0.33)和完整性(kappa0.29)方面较低。当第三位读者的评级被纳入分析时,读者之间的协议没有统计学上的显著变化。
结论:人工智能生成的外行人报告摘要的准确性和完整性的总体评分很高,只有一小部分可能会混淆或不准确。这项研究说明了利用生成AI的潜力,例如ChatGPT-4,以自动生成对患者友好的肌肉骨骼MR成像摘要。
公众号