关键词: Artificial intelligence ChatGPT Decision support systems Large language model OpenAI

Mesh : Humans Radiology Forecasting

来  源:   DOI:10.1016/j.diii.2024.04.003

Abstract:
OBJECTIVE: The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications.
METHODS: After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications.
RESULTS: Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT\'s performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists\' decision or guidelines], generally confirming ChatGPT\'s high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks.
CONCLUSIONS: Although ChatGPT\'s effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT\'s role in radiology.
摘要:
目的:本研究的目的是系统回顾ChatGPT的报告表现,确定潜在的限制,并探索其整合的未来方向,优化,以及放射学应用中的伦理考虑。
方法:在对PubMed进行全面审查后,WebofScience,Embase,和谷歌学者数据库,截至2024年1月1日,我们确定了一组已发表的研究,利用ChatGPT进行临床放射学应用.
结果:在得出的861项研究中,44项研究评估了ChatGPT的性能;其中,37(37/44;84.1%)表现出高性能,7人(7/44;15.9%)表示在提供诊断和临床决策支持(6/44;13.6%)以及患者沟通和教育内容(1/44;2.3%)方面表现较低.24项(24/44;54.5%)研究报告了ChatGPT表现的比例。其中,19项(19/24;79.2%)研究记录的中位准确率为70.5%,在五项(5/24;20.8%)研究中,ChatGPT结果与参考标准[放射科医师的决定或指南]的一致性中位数为83.6%,在这些研究中普遍证实了ChatGPT的高准确性。11项研究比较了两个最新的ChatGPT版本,十个(10/11;90.9%),ChatGPTv4的表现优于v3.5,在解决高阶思维问题方面表现出显著的增强,更好地理解放射学术语,并提高了描述图像的准确性。使用ChatGPT的风险和担忧包括有偏见的回应,独创性有限,以及不准确信息导致错误信息的可能性,幻觉,不当引用和虚假引用,网络安全漏洞,和患者隐私风险。
结论:尽管在84.1%的放射学研究中显示了ChatGPT的有效性,仍然有许多陷阱和限制需要解决。现在确认其完整的熟练程度和准确性还为时过早,需要更广泛的多中心研究利用不同的数据集和预训练技术来验证ChatGPT在放射学中的作用。
公众号