关键词: Clinical Practice Guidelines Ménière's disease artificial intelligence patient education

来  源:   DOI:10.1002/oto2.163   PDF(Pubmed)

Abstract:
UNASSIGNED: Evaluate the quality of responses from Chat Generative Pre-Trained Transformer (ChatGPT) models compared to the answers for \"Frequently Asked Questions\" (FAQs) from the American Academy of Otolaryngology-Head and Neck Surgery (AAO-HNS) Clinical Practice Guidelines (CPG) for Ménière\'s disease (MD).
UNASSIGNED: Comparative analysis.
UNASSIGNED: The AAO-HNS CPG for MD includes FAQs that clinicians can give to patients for MD-related questions. The ability of ChatGPT to properly educate patients regarding MD is unknown.
UNASSIGNED: ChatGPT-3.5 and 4.0 were each prompted with 16 questions from the MD FAQs. Each response was rated in terms of (1) comprehensiveness, (2) extensiveness, (3) presence of misleading information, and (4) quality of resources. Readability was assessed using Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease Score (FRES).
UNASSIGNED: ChatGPT-3.5 was comprehensive in 5 responses whereas ChatGPT-4.0 was comprehensive in 9 (31.3% vs 56.3%, P = .2852). ChatGPT-3.5 and 4.0 were extensive in all responses (P = 1.0000). ChatGPT-3.5 was misleading in 5 responses whereas ChatGPT-4.0 was misleading in 3 (31.3% vs 18.75%, P = .6851). ChatGPT-3.5 had quality resources in 10 responses whereas ChatGPT-4.0 had quality resources in 16 (62.5% vs 100%, P = .0177). AAO-HNS CPG FRES (62.4 ± 16.6) demonstrated an appropriate readability score of at least 60, while both ChatGPT-3.5 (39.1 ± 7.3) and 4.0 (42.8 ± 8.5) failed to meet this standard. All platforms had FKGL means that exceeded the recommended level of 6 or lower.
UNASSIGNED: While ChatGPT-4.0 had significantly better resource reporting, both models have room for improvement in being more comprehensive, more readable, and less misleading for patients.
摘要:
与美国耳鼻咽喉头颈外科学会(AAO-HNS)梅尼雷疾病(MD)临床实践指南(CPG)的“常见问题”(FAQ)的答案相比,评估聊天生成预培训变压器(ChatGPT)模型的响应质量。
比较分析。
用于MD的AAO-HNSCPG包括临床医生可以针对MD相关问题向患者提供的常见问题。ChatGPT正确教育患者有关MD的能力尚不清楚。
ChatGPT-3.5和4.0均提示了来自MD常见问题的16个问题。每个反应都根据(1)全面性进行评级,(2)广泛性,(3)存在误导性信息,(4)资源质量。使用Flesch-Kincaid等级水平(FKGL)和Flesch阅读轻松评分(FRES)评估可读性。
ChatGPT-3.5在5个反应中是全面的,而ChatGPT-4.0在9个反应中是全面的(31.3%vs56.3%,P=.2852)。ChatGPT-3.5和4.0在所有反应中均广泛存在(P=1.0000)。ChatGPT-3.5在5个反应中具有误导性,而ChatGPT-4.0在3个反应中具有误导性(31.3%vs18.75%,P=.6851)。ChatGPT-3.5在10个响应中具有质量资源,而ChatGPT-4.0在16个响应中具有质量资源(62.5%vs100%,P=.0177)。AAO-HNSCPGFRES(62.4±16.6)表现出至少60的适当可读性评分,而ChatGPT-3.5(39.1±7.3)和4.0(42.8±8.5)均未达到该标准。所有平台的FKGL均值都超过了6或更低的推荐水平。
虽然ChatGPT-4.0具有明显更好的资源报告,这两种模式在更全面方面都有改进的空间,更具可读性,减少对患者的误导。
公众号