关键词: GPT-3.5 GPT-4 LLAMA clinical trial matching distillation large language models

来  源:   DOI:10.1093/jamia/ocae073

Abstract:
OBJECTIVE: The objective of this study is to systematically examine the efficacy of both proprietary (GPT-3.5, GPT-4) and open-source large language models (LLMs) (LLAMA 7B, 13B, 70B) in the context of matching patients to clinical trials in healthcare.
METHODS: The study employs a multifaceted evaluation framework, incorporating extensive automated and human-centric assessments along with a detailed error analysis for each model, and assesses LLMs\' capabilities in analyzing patient eligibility against clinical trial\'s inclusion and exclusion criteria. To improve the adaptability of open-source LLMs, a specialized synthetic dataset was created using GPT-4, facilitating effective fine-tuning under constrained data conditions.
RESULTS: The findings indicate that open-source LLMs, when fine-tuned on this limited and synthetic dataset, achieve performance parity with their proprietary counterparts, such as GPT-3.5.
CONCLUSIONS: This study highlights the recent success of LLMs in the high-stakes domain of healthcare, specifically in patient-trial matching. The research demonstrates the potential of open-source models to match the performance of proprietary models when fine-tuned appropriately, addressing challenges like cost, privacy, and reproducibility concerns associated with closed-source proprietary LLMs.
CONCLUSIONS: The study underscores the opportunity for open-source LLMs in patient-trial matching. To encourage further research and applications in this field, the annotated evaluation dataset and the fine-tuned LLM, Trial-LLAMA, are released for public use.
摘要:
目的:本研究的目的是系统地检查专有(GPT-3.5,GPT-4)和开源大型语言模型(LLM)(LLAMA7B,13B,70B)在将患者与医疗保健临床试验相匹配的背景下。
方法:该研究采用了多方面的评估框架,结合广泛的自动化和以人为中心的评估,以及对每个模型的详细误差分析,并根据临床试验的纳入和排除标准评估LLM分析患者资格的能力。为了提高开源LLM的适应性,使用GPT-4创建了专门的合成数据集,有助于在受限数据条件下进行有效的微调.
结果:研究结果表明,开源LLM,当在这个有限的合成数据集上进行微调时,实现与专有同行的性能均等,例如GPT-3.5。
结论:这项研究强调了LLM在高风险医疗保健领域的近期成功,特别是在患者-试验匹配中。该研究表明,在适当调整时,开源模型具有与专有模型的性能相匹配的潜力,解决成本等挑战,隐私,
结论:该研究强调了在患者-试验匹配中使用开源LLM的机会。为了鼓励在这一领域的进一步研究和应用,注释的评估数据集和微调的LLM,审判-LLAMA,已发布供公众使用。
公众号