关键词: ChatGPT GPT accuracy artificial intelligence head neck surgery images laryngology otolaryngology picture video

来  源:   DOI:10.1002/ohn.897

Abstract:
OBJECTIVE: To investigate the consistency of Chatbot Generative Pretrained Transformer (ChatGPT)-4 in the analysis of clinical pictures of common laryngological conditions.
METHODS: Prospective uncontrolled study.
METHODS: Multicenter study.
METHODS: Patient history and clinical videolaryngostroboscopic images were presented to ChatGPT-4 for differential diagnoses, management, and treatment(s). ChatGPT-4 responses were assessed by 3 blinded laryngologists with the artificial intelligence performance instrument (AIPI). The complexity of cases and the consistency between practitioners and ChatGPT-4 for interpreting clinical images were evaluated with a 5-point Likert Scale. The intraclass correlation coefficient (ICC) was used to measure the strength of interrater agreement.
RESULTS: Forty patients with a mean complexity score of 2.60 ± 1.15. were included. The mean consistency score for ChatGPT-4 image interpretation was 2.46 ± 1.42. ChatGPT-4 perfectly analyzed the clinical images in 6 cases (15%; 5/5), while the consistency between GPT-4 and judges was high in 5 cases (12.5%; 4/5). Judges reported an ICC of 0.965 for the consistency score (P = .001). ChatGPT-4 erroneously documented vocal fold irregularity (mass or lesion), glottic insufficiency, and vocal cord paralysis in 21 (52.5%), 2 (0.05%), and 5 (12.5%) cases, respectively. ChatGPT-4 and practitioners indicated 153 and 63 additional examinations, respectively (P = .001). The ChatGPT-4 primary diagnosis was correct in 20.0% to 25.0% of cases. The clinical image consistency score was significantly associated with the AIPI score (rs = 0.830; P = .001).
CONCLUSIONS: The ChatGPT-4 is more efficient in primary diagnosis, rather than in the image analysis, selecting the most adequate additional examinations and treatments.
摘要:
目的:研究Chatbot生成预训练变压器(ChatGPT)-4在常见喉科疾病临床图片分析中的一致性。
方法:前瞻性非对照研究。
方法:多中心研究。
方法:将患者病史和临床视频喉镜图像提供给ChatGPT-4进行鉴别诊断,管理,和治疗(S)。ChatGPT-4反应由3名盲喉科医师使用人工智能性能仪器(AIPI)进行评估。使用5点Likert量表评估了病例的复杂性以及从业人员与ChatGPT-4之间解释临床图像的一致性。使用组内相关系数(ICC)来衡量评估者之间的一致性强度。
结果:40例患者,平均复杂性评分为2.60±1.15。包括在内。ChatGPT-4图像解释的平均一致性评分为2.46±1.42。ChatGPT-4完美分析了6例(15%;5/5)的临床图像,而GPT-4和法官之间的一致性在5个案例中很高(12.5%;4/5)。法官报告的一致性得分ICC为0.965(P=.001)。ChatGPT-4错误地记录了声带不规则性(肿块或病变),声门功能不全,和声带麻痹21(52.5%),2(0.05%),和5例(12.5%),分别。ChatGPT-4和从业人员进行了153和63次额外检查,分别(P=.001)。在20.0%至25.0%的病例中,ChatGPT-4的主要诊断是正确的。临床图像一致性评分与AIPI评分显著相关(rs=0.830;P=.001)。
结论:ChatGPT-4在主要诊断中更有效,而不是在图像分析中,选择最适当的额外检查和治疗。
公众号