提取大型语言模型，以将患者与临床试验相匹配。Distilling large language models for matching patients to clinical trials.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

OBJECTIVE: The objective of this study is to systematically examine the efficacy of both proprietary (GPT-3.5, GPT-4) and open-source large language models (LLMs) (LLAMA 7B, 13B, 70B) in the context of matching patients to clinical trials in healthcare.
METHODS: The study employs a multifaceted evaluation framework, incorporating extensive automated and human-centric assessments along with a detailed error analysis for each model, and assesses LLMs\' capabilities in analyzing patient eligibility against clinical trial\'s inclusion and exclusion criteria. To improve the adaptability of open-source LLMs, a specialized synthetic dataset was created using GPT-4, facilitating effective fine-tuning under constrained data conditions.
RESULTS: The findings indicate that open-source LLMs, when fine-tuned on this limited and synthetic dataset, achieve performance parity with their proprietary counterparts, such as GPT-3.5.
CONCLUSIONS: This study highlights the recent success of LLMs in the high-stakes domain of healthcare, specifically in patient-trial matching. The research demonstrates the potential of open-source models to match the performance of proprietary models when fine-tuned appropriately, addressing challenges like cost, privacy, and reproducibility concerns associated with closed-source proprietary LLMs.
CONCLUSIONS: The study underscores the opportunity for open-source LLMs in patient-trial matching. To encourage further research and applications in this field, the annotated evaluation dataset and the fine-tuned LLM, Trial-LLAMA, are released for public use.

摘要：

目的：本研究的目的是系统地检查专有（GPT-3.5，GPT-4）和开源大型语言模型（LLM）（LLAMA7B，13B,70B)在将患者与医疗保健临床试验相匹配的背景下。
方法：该研究采用了多方面的评估框架，结合广泛的自动化和以人为中心的评估，以及对每个模型的详细误差分析，并根据临床试验的纳入和排除标准评估LLM分析患者资格的能力。为了提高开源LLM的适应性，使用GPT-4创建了专门的合成数据集,有助于在受限数据条件下进行有效的微调.
结果：研究结果表明，开源LLM，当在这个有限的合成数据集上进行微调时，实现与专有同行的性能均等，例如GPT-3.5。
结论：这项研究强调了LLM在高风险医疗保健领域的近期成功，特别是在患者-试验匹配中。该研究表明，在适当调整时，开源模型具有与专有模型的性能相匹配的潜力，解决成本等挑战，隐私,
结论：该研究强调了在患者-试验匹配中使用开源LLM的机会。为了鼓励在这一领域的进一步研究和应用，注释的评估数据集和微调的LLM，审判-LLAMA,已发布供公众使用。