■受试者筛选是所有临床试验的关键方面;然而,传统上,这是一项劳动密集型和容易出错的任务,需要大量的时间和资源。随着大型语言模型(LLM)和相关技术的出现,自然语言处理能力的范式转变为提高筛查工作的质量和效率提供了有希望的途径。本研究旨在测试启用生成预训练变压器版本4(GPT-4)的检索增强生成(RAG)过程,以准确识别和报告临床试验的纳入和排除标准。
■实施心力衰竭最佳治疗方案(COPILOT-HF)试验旨在招募有症状的心力衰竭患者。作为筛选过程的一部分,通过电子健康记录(EHR)查询创建潜在符合条件的患者列表.目前,EHR中的结构化数据只能用于确定6个纳入标准中的5个和17个排除标准中的5个。受过训练,但是没有许可,研究人员完成手动图表审查,以确定患者的资格,并记录他们对纳入和排除标准的评估.我们获得了研究人员在过去两年中完成的结构化评估和临床笔记,并开发了由RAG架构和GPT-4提供支持的基于临床笔记的问答系统的工作流程,我们将其命名为RECTIFIER(RAG启用的临床试验基础设施,用于排除审查)。我们使用了100名患者的笔记作为发展数据集,282名患者作为验证数据集,和1894名患者作为测试集。专家临床医生完成了对患者图表的盲目审查,以回答资格问题并确定“黄金标准”答案。我们计算了灵敏度,特异性,准确度,和马修斯相关系数(MCC)为每个问题和筛选方法。我们还进行了自举以计算每个统计量的置信区间。
■RECTIFIER和研究人员的回答与标准中的专家临床医生的回答密切相关,RECTIFIER的准确度在97.9%和100%之间(MCC0.837和1),研究人员的准确度在91.7%和100%之间(MCC0.644和1)。RECTIFIER在确定“有症状的心力衰竭”的纳入标准方面优于研究人员,准确率分别为97.9%和91.7%,MCC为0.924和0.721。总的来说,确定RECTIFIER合格的敏感性和特异性分别为92.3%(CI)和93.9%(CI),研究人员分别为90.1%(CI)和83.6%(CI),分别。
■基于GPT-4的解决方案具有在临床试验筛选中提高效率并降低成本的潜力。当使用新的工具如RECTIFIER时,重要的是要考虑自动化筛查过程的潜在危害,并制定适当的缓解策略,例如在患者参与之前进行最终的临床医师检查.
UNASSIGNED: Subject screening is a key aspect of all clinical trials; however, traditionally, it is a labor-intensive and error-prone task, demanding significant time and resources. With the advent of large language models (LLMs) and related technologies, a paradigm shift in natural language processing capabilities offers a promising avenue for increasing both quality and efficiency of screening efforts. This study aimed to test the Retrieval-Augmented Generation (RAG) process enabled Generative Pretrained Transformer Version 4 (GPT-4) to accurately identify and report on inclusion and exclusion criteria for a clinical trial.
UNASSIGNED: The Co-Operative Program for Implementation of Optimal Therapy in Heart Failure (COPILOT-HF) trial aims to recruit patients with symptomatic heart failure. As part of the screening process, a list of potentially eligible patients is created through an electronic health record (EHR) query. Currently, structured data in the EHR can only be used to determine 5 out of 6 inclusion and 5 out of 17 exclusion criteria. Trained, but non-licensed, study staff complete manual chart review to determine patient eligibility and record their assessment of the inclusion and exclusion criteria. We obtained the structured assessments completed by the study staff and clinical notes for the past two years and developed a workflow of clinical note-based question answering system powered by RAG architecture and GPT-4 that we named RECTIFIER (RAG-Enabled Clinical Trial Infrastructure for Inclusion Exclusion Review). We used notes from 100 patients as a development dataset, 282 patients as a validation dataset, and 1894 patients as a test set. An expert clinician completed a blinded review of patients\' charts to answer the eligibility questions and determine the \"gold standard\" answers. We calculated the sensitivity, specificity, accuracy, and Matthews correlation coefficient (MCC) for each question and screening method. We also performed bootstrapping to calculate the confidence intervals for each statistic.
UNASSIGNED: Both RECTIFIER and study staff answers closely aligned with the expert clinician answers across criteria with accuracy ranging between 97.9% and 100% (MCC 0.837 and 1) for RECTIFIER and 91.7% and 100% (MCC 0.644 and 1) for study staff. RECTIFIER performed better than study staff to determine the inclusion criteria of \"symptomatic heart failure\" with an accuracy of 97.9% vs 91.7% and an MCC of 0.924 vs 0.721, respectively. Overall, the sensitivity and specificity of determining eligibility for the RECTIFIER was 92.3% (CI) and 93.9% (CI), and study staff was 90.1% (CI) and 83.6% (CI), respectively.
UNASSIGNED: GPT-4 based solutions have the potential to improve efficiency and reduce costs in clinical trial screening. When incorporating new tools such as RECTIFIER, it is important to consider the potential hazards of automating the screening process and set up appropriate mitigation strategies such as final clinician review before patient engagement.