背景:诸如ChatGPT之类的大型语言模型(LLM)的最新增强功能以指数方式增加了用户的采用率。这些模型可在移动设备上访问,并支持多模式交互,包括谈话,代码生成,和病人图像上传,扩大其效用,为医疗保健专业人员提供临床决策的实时支持。然而,许多作者强调了采用LLM可能带来的严重风险,主要与安全和符合道德准则有关。
目标:为了应对这些挑战,我们引入了一种新颖的方法学方法,旨在评估在医疗保健领域采用LLM的具体可行性,专注于临床护理,评估他们的表现,从而指导他们的选择。强调法学硕士坚持科学进步,这种方法优先考虑安全和护理个性化,根据“经济合作与发展组织”负责任的人工智能框架。此外,它的动态性质旨在适应LLM的未来演变。
方法:通过整合先进的多学科知识,包括护理信息学,并在前瞻性文献综述的帮助下,确定了七个关键领域和具体评估项目如下:由护理和人工智能专家进行了同行评审,确保科学的严谨性和洞察力的广度,可重复,和连贯的方法论方法。通过李克特7分的量表,定义阈值是为了将LLM分类为“不可用”,\"高度谨慎使用\",和“推荐”类别。在临床肿瘤护理决策中使用这种方法评估了9种最先进的LLM。产生初步结果。双子座高级,AnthropicClaude3和ChatGPT4在分类为“推荐”时达到了最先进的对齐和安全领域的最低得分,也得到了所有领域的认可。LLAMA370B和ChatGPT3.5被归类为“高度谨慎使用”。\"其他人在此域中被归类为不可用。
结论:确定特定医疗保健领域的推荐LLM,结合其批判性,谨慎,和综合使用,可以在决策过程中支持医疗保健专业人员。
BACKGROUND: Recent enhancements in Large Language Models (LLMs) such as ChatGPT have exponentially increased user adoption. These models are accessible on mobile devices and support multimodal interactions, including conversations, code generation, and patient image uploads, broadening their utility in providing healthcare professionals with real-time support for clinical decision-making. Nevertheless, many authors have highlighted serious risks that may arise from the adoption of LLMs, principally related to safety and alignment with ethical guidelines.
OBJECTIVE: To address these challenges, we introduce a novel methodological approach designed to assess the specific feasibility of adopting LLMs within a healthcare area, with a focus on clinical nursing, evaluating their performance and thereby directing their choice. Emphasizing LLMs\' adherence to scientific advancements, this approach prioritizes safety and care personalization, according to the \"Organization for Economic Co-operation and Development\" frameworks for responsible AI. Moreover, its dynamic nature is designed to adapt to future evolutions of LLMs.
METHODS: Through integrating advanced multidisciplinary knowledge, including Nursing Informatics, and aided by a prospective literature review, seven key domains and specific evaluation items were identified as follows:A Peer Review by experts in Nursing and AI was performed, ensuring scientific rigor and breadth of insights for an essential, reproducible, and coherent methodological approach. By means of a 7-point Likert scale, thresholds are defined in order to classify LLMs as \"unusable\", \"usable with high caution\", and \"recommended\" categories. Nine state of the art LLMs were evaluated using this methodology in clinical oncology nursing decision-making, producing preliminary results. Gemini Advanced, Anthropic Claude 3 and ChatGPT 4 achieved the minimum score of the State of the Art Alignment & Safety domain for classification as \"recommended\", being also endorsed across all domains. LLAMA 3 70B and ChatGPT 3.5 were classified as \"usable with high caution.\" Others were classified as unusable in this domain.
CONCLUSIONS: The identification of a recommended LLM for a specific healthcare area, combined with its critical, prudent, and integrative use, can support healthcare professionals in decision-making processes.