关键词: AIPI ChatGPT GEMINI Large Language Models (LLM) QAMAI maxillofacial maxillofacial surgery trauma triage

来  源:   DOI:10.3390/diagnostics14080839   PDF(Pubmed)

Abstract:
BACKGROUND: In the evolving field of maxillofacial surgery, integrating advanced technologies like Large Language Models (LLMs) into medical practices, especially for trauma triage, presents a promising yet largely unexplored potential. This study aimed to evaluate the feasibility of using LLMs for triaging complex maxillofacial trauma cases by comparing their performance against the expertise of a tertiary referral center.
METHODS: Utilizing a comprehensive review of patient records in a tertiary referral center over a year-long period, standardized prompts detailing patient demographics, injury characteristics, and medical histories were created. These prompts were used to assess the triage suggestions of ChatGPT 4.0 and Google GEMINI against the center\'s recommendations, supplemented by evaluating the AI\'s performance using the QAMAI and AIPI questionnaires.
RESULTS: The results in 10 cases of major maxillofacial trauma indicated moderate agreement rates between LLM recommendations and the referral center, with some variances in the suggestion of appropriate examinations (70% ChatGPT and 50% GEMINI) and treatment plans (60% ChatGPT and 45% GEMINI). Notably, the study found no statistically significant differences in several areas of the questionnaires, except in the diagnosis accuracy (GEMINI: 3.30, ChatGPT: 2.30; p = 0.032) and relevance of the recommendations (GEMINI: 2.90, ChatGPT: 3.50; p = 0.021). A Spearman correlation analysis highlighted significant correlations within the two questionnaires, specifically between the QAMAI total score and AIPI treatment scores (rho = 0.767, p = 0.010).
CONCLUSIONS: This exploratory investigation underscores the potential of LLMs in enhancing clinical decision making for maxillofacial trauma cases, indicating a need for further research to refine their application in healthcare settings.
摘要:
背景:在颌面外科领域的发展中,将大型语言模型(LLM)等先进技术集成到医疗实践中,尤其是对于外伤的分诊,提出了一个有希望但基本上未开发的潜力。本研究旨在通过将LLM的表现与三级转诊中心的专业知识进行比较,来评估使用LLM来分类复杂颌面创伤病例的可行性。
方法:利用三级转诊中心一年来对患者记录进行全面审查,标准化提示详细说明患者的人口统计信息,损伤特征,并创建了病史。这些提示用于评估ChatGPT4.0和GoogleGEMINI的分诊建议与中心的建议,通过使用QAMAI和AIPI问卷评估AI的表现进行补充。
结果:10例主要颌面部创伤的结果表明,LLM建议与转诊中心之间的符合率中等,在适当检查(70%ChatGPT和50%GEMINI)和治疗计划(60%ChatGPT和45%GEMINI)的建议方面存在一些差异。值得注意的是,研究发现问卷的几个方面没有统计学上的显著差异,除了诊断准确性(GEMINI:3.30,ChatGPT:2.30;p=0.032)和建议的相关性(GEMINI:2.90,ChatGPT:3.50;p=0.021)。Spearman相关性分析强调了两份问卷中的显著相关性,特别是在QAMAI总评分和AIPI治疗评分之间(rho=0.767,p=0.010)。
结论:这项探索性研究强调了LLM在增强颌面创伤病例临床决策方面的潜力,这表明需要进一步研究以完善其在医疗机构中的应用。
公众号