PMC - LLaMA ：构建医学开源语言模型。PMC-LLaMA: toward building open-source language models for medicine.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

OBJECTIVE: Recently, large language models (LLMs) have showcased remarkable capabilities in natural language understanding. While demonstrating proficiency in everyday conversations and question-answering (QA) situations, these models frequently struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge. In this article, we describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
METHODS: We adapt a general-purpose LLM toward the medical domain, involving data-centric knowledge injection through the integration of 4.8M biomedical academic papers and 30K medical textbooks, as well as comprehensive domain-specific instruction fine-tuning, encompassing medical QA, rationale for reasoning, and conversational dialogues with 202M tokens.
RESULTS: While evaluating various public medical QA benchmarks and manual rating, our lightweight PMC-LLaMA, which consists of only 13B parameters, exhibits superior performance, even surpassing ChatGPT. All models, codes, and datasets for instruction tuning will be released to the research community.
CONCLUSIONS: Our contributions are 3-fold: (1) we build up an open-source LLM toward the medical domain. We believe the proposed PMC-LLaMA model can promote further development of foundation models in medicine, serving as a medical trainable basic generative language backbone; (2) we conduct thorough ablation studies to demonstrate the effectiveness of each proposed component, demonstrating how different training data and model scales affect medical LLMs; (3) we contribute a large-scale, comprehensive dataset for instruction tuning.
CONCLUSIONS: In this article, we systematically investigate the process of building up an open-source medical-specific LLM, PMC-LLaMA.

摘要：

目标：最近，大型语言模型(LLM)在自然语言理解方面展示了卓越的能力。在展示日常对话和问答（QA）情况的熟练程度时，这些模型经常在需要精度的领域中挣扎，如医疗应用，由于他们缺乏特定领域的知识。在这篇文章中,我们描述了建造一个强大的，专为医学应用而设计的开源语言模型，称为PMC-LLaMA。
方法：我们将通用LLM调整为医学领域，通过整合4.8M生物医学学术论文和30K医学教科书，涉及以数据为中心的知识注入，以及全面的特定领域指令微调，包括医疗QA，推理的理由，和对话对话与202M令牌。
结果：在评估各种公共医疗QA基准和手动评级时，我们的轻量级PMC-LLaMA，仅由13B个参数组成，表现出优越的性能，甚至超过了ChatGPT.所有型号,代码,和调整指令的数据集将发布给研究界。
结论：我们的贡献是3倍：（1）我们建立了一个面向医学领域的开源LLM。我们相信提出的PMC-LLaMA模型可以促进医学基础模型的进一步发展，作为医学训练的基本生成语言骨干;（2）我们进行彻底的消融研究，以证明每个建议组件的有效性，展示不同的训练数据和模型尺度如何影响医学LLM；(3)我们贡献了大规模，用于指令调整的综合数据集。
结论：在本文中，我们系统地研究了建立开源医疗专用LLM的过程，PMC-LLaMA.