关键词: ChatGPT biomedical NLP generative language models large language models

来  源:   DOI:10.1093/jamia/ocae045

Abstract:
OBJECTIVE: Recently, large language models (LLMs) have showcased remarkable capabilities in natural language understanding. While demonstrating proficiency in everyday conversations and question-answering (QA) situations, these models frequently struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge. In this article, we describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
METHODS: We adapt a general-purpose LLM toward the medical domain, involving data-centric knowledge injection through the integration of 4.8M biomedical academic papers and 30K medical textbooks, as well as comprehensive domain-specific instruction fine-tuning, encompassing medical QA, rationale for reasoning, and conversational dialogues with 202M tokens.
RESULTS: While evaluating various public medical QA benchmarks and manual rating, our lightweight PMC-LLaMA, which consists of only 13B parameters, exhibits superior performance, even surpassing ChatGPT. All models, codes, and datasets for instruction tuning will be released to the research community.
CONCLUSIONS: Our contributions are 3-fold: (1) we build up an open-source LLM toward the medical domain. We believe the proposed PMC-LLaMA model can promote further development of foundation models in medicine, serving as a medical trainable basic generative language backbone; (2) we conduct thorough ablation studies to demonstrate the effectiveness of each proposed component, demonstrating how different training data and model scales affect medical LLMs; (3) we contribute a large-scale, comprehensive dataset for instruction tuning.
CONCLUSIONS: In this article, we systematically investigate the process of building up an open-source medical-specific LLM, PMC-LLaMA.
摘要:
目标:最近,大型语言模型(LLM)在自然语言理解方面展示了卓越的能力。在展示日常对话和问答(QA)情况的熟练程度时,这些模型经常在需要精度的领域中挣扎,如医疗应用,由于他们缺乏特定领域的知识。在这篇文章中,我们描述了建造一个强大的,专为医学应用而设计的开源语言模型,称为PMC-LLaMA。
方法:我们将通用LLM调整为医学领域,通过整合4.8M生物医学学术论文和30K医学教科书,涉及以数据为中心的知识注入,以及全面的特定领域指令微调,包括医疗QA,推理的理由,和对话对话与202M令牌。
结果:在评估各种公共医疗QA基准和手动评级时,我们的轻量级PMC-LLaMA,仅由13B个参数组成,表现出优越的性能,甚至超过了ChatGPT.所有型号,代码,和调整指令的数据集将发布给研究界。
结论:我们的贡献是3倍:(1)我们建立了一个面向医学领域的开源LLM。我们相信提出的PMC-LLaMA模型可以促进医学基础模型的进一步发展,作为医学训练的基本生成语言骨干;(2)我们进行彻底的消融研究,以证明每个建议组件的有效性,展示不同的训练数据和模型尺度如何影响医学LLM;(3)我们贡献了大规模,用于指令调整的综合数据集。
结论:在本文中,我们系统地研究了建立开源医疗专用LLM的过程,PMC-LLaMA.
公众号