背景:去年,世界见证了大型语言模型(LLM)的采用。尽管使用LLM开发的产品有可能解决医疗保健中的可及性和效率问题,缺乏开发医疗保健LLM的可用指南,尤其是医学教育。
目的:本研究的目的是确定并优先考虑为医学教育开发成功的LLM的推动者。我们进一步评估了这些确定的推动者之间的关系。
方法:首先对现有文献进行叙述性回顾,以确定LLM开发的关键推动者。我们还收集了LLM用户的意见,以使用层次分析法(AHP)确定这些推动者的相对重要性,这是一种多准则决策方法。Further,总体解释结构模型(TISM)用于分析产品开发人员的观点,并确定这些推动者之间的关系和层次结构.最后,应用于分类(MICMAC)方法的基于交叉影响矩阵的乘法用于确定这些推动者的相对驱动和依赖能力。非概率目的抽样方法用于招募焦点小组。
结果:AHP证明了LLM最重要的推动因素是可信度,优先级权重为0.37,其次是问责制(0.27642)和公平性(0.10572)。相比之下,可用性,优先级权重为0.04,显示出微不足道的重要性。TISM的结果与AHP的结果一致。专家观点和用户偏好评估之间唯一显著的区别是,产品开发人员指出,成本作为潜在的推动者最不重要。MICMAC分析表明,成本对其他促成因素有很大影响。焦点小组的输入被认为是可靠的,稠度比小于0.1(0.084)。
结论:这项研究首次确定,优先考虑,并分析有效医学教育LLM的推动者之间的关系。根据这项研究的结果,我们开发了一个可理解的规范框架,名为CUC-FATE(成本,可用性,可信度,公平,问责制,透明度,和可解释性),用于评估医学教育中LLM的推动者。这项研究结果对医疗保健专业人员很有用,健康技术专家,医疗技术监管机构,和政策制定者。
BACKGROUND: The world has witnessed increased
adoption of large language models (LLMs) in the last year. Although the products developed using LLMs have the potential to solve accessibility and efficiency problems in health care, there is a lack of available guidelines for developing LLMs for health care, especially for medical education.
OBJECTIVE: The aim of this study was to identify and prioritize the enablers for developing successful LLMs for medical education. We further evaluated the relationships among these identified enablers.
METHODS: A narrative review of the extant literature was first performed to identify the key enablers for LLM development. We additionally gathered the opinions of LLM users to determine the relative importance of these enablers using an analytical hierarchy process (AHP), which is a multicriteria decision-making method. Further, total interpretive structural modeling (TISM) was used to analyze the perspectives of product developers and ascertain the relationships and hierarchy among these enablers. Finally, the cross-impact matrix-based multiplication applied to a classification (MICMAC) approach was used to determine the relative driving and dependence powers of these enablers. A nonprobabilistic purposive sampling approach was used for recruitment of focus groups.
RESULTS: The AHP demonstrated that the most important enabler for LLMs was credibility, with a priority weight of 0.37, followed by accountability (0.27642) and fairness (0.10572). In contrast, usability, with a priority weight of 0.04, showed negligible importance. The results of TISM concurred with the findings of the AHP. The only striking difference between expert perspectives and user preference evaluation was that the product developers indicated that cost has the least importance as a potential enabler. The MICMAC analysis suggested that cost has a strong influence on other enablers. The inputs of the focus group were found to be reliable, with a consistency ratio less than 0.1 (0.084).
CONCLUSIONS: This study is the first to identify, prioritize, and analyze the relationships of enablers of effective LLMs for medical education. Based on the results of this study, we developed a comprehendible prescriptive framework, named CUC-FATE (Cost, Usability, Credibility, Fairness, Accountability, Transparency, and Explainability), for evaluating the enablers of LLMs in medical education. The study findings are useful for health care professionals, health technology experts, medical technology regulators, and policy makers.