关键词: ChatGPT GPT-3 GPT-4 NLP artificial intelligence clinical decision support systems clinical practice extract extraction health care healthcare language model large language models machine learning natural language processing review methodology review methods systematic systematic review text unstructured

来  源:   DOI:10.2196/48933   PDF(Pubmed)

Abstract:
BACKGROUND: This research integrates a comparative analysis of the performance of human researchers and OpenAI\'s ChatGPT in systematic review tasks and describes an assessment of the application of natural language processing (NLP) models in clinical practice through a review of 5 studies.
OBJECTIVE: This study aimed to evaluate the reliability between ChatGPT and human researchers in extracting key information from clinical articles, and to investigate the practical use of NLP in clinical settings as evidenced by selected studies.
METHODS: The study design comprised a systematic review of clinical articles executed independently by human researchers and ChatGPT. The level of agreement between and within raters for parameter extraction was assessed using the Fleiss and Cohen κ statistics.
RESULTS: The comparative analysis revealed a high degree of concordance between ChatGPT and human researchers for most parameters, with less agreement for study design, clinical task, and clinical implementation. The review identified 5 significant studies that demonstrated the diverse applications of NLP in clinical settings. These studies\' findings highlight the potential of NLP to improve clinical efficiency and patient outcomes in various contexts, from enhancing allergy detection and classification to improving quality metrics in psychotherapy treatments for veterans with posttraumatic stress disorder.
CONCLUSIONS: Our findings underscore the potential of NLP models, including ChatGPT, in performing systematic reviews and other clinical tasks. Despite certain limitations, NLP models present a promising avenue for enhancing health care efficiency and accuracy. Future studies must focus on broadening the range of clinical applications and exploring the ethical considerations of implementing NLP applications in health care settings.
摘要:
背景:这项研究整合了人类研究人员和OpenAI的ChatGPT在系统评价任务中的表现的比较分析,并通过对5项研究的回顾,描述了自然语言处理(NLP)模型在临床实践中的应用评估。
目的:本研究旨在评估ChatGPT和人类研究人员从临床文章中提取关键信息的可靠性。并调查NLP在临床环境中的实际使用,如选定的研究所证明的。
方法:研究设计包括由人类研究人员和ChatGPT独立执行的临床文章的系统评价。使用Fleiss和Cohenκ统计量评估评估者之间和内部参数提取的一致性水平。
结果:比较分析显示,ChatGPT与人类研究人员在大多数参数方面具有高度一致性,由于对研究设计的协议较少,临床任务,和临床实施。该综述确定了5项重要研究,证明了NLP在临床环境中的不同应用。这些研究结果强调了NLP在各种情况下改善临床效率和患者预后的潜力。从增强过敏检测和分类到改善创伤后应激障碍退伍军人心理治疗的质量指标。
结论:我们的发现强调了NLP模型的潜力,包括ChatGPT,在执行系统评价和其他临床任务时。尽管有一定的局限性,NLP模型为提高医疗保健效率和准确性提供了有希望的途径。未来的研究必须专注于扩大临床应用的范围,并探索在医疗保健环境中实施NLP应用的伦理考虑。
公众号