ChatGPT - 4 在解释常见病变和疾病的喉部临床图像中的一致性。ChatGPT-4 Consistency in Interpreting Laryngeal Clinical Images of Common Lesions and Disorders.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

OBJECTIVE: To investigate the consistency of Chatbot Generative Pretrained Transformer (ChatGPT)-4 in the analysis of clinical pictures of common laryngological conditions.
METHODS: Prospective uncontrolled study.
METHODS: Multicenter study.
METHODS: Patient history and clinical videolaryngostroboscopic images were presented to ChatGPT-4 for differential diagnoses, management, and treatment(s). ChatGPT-4 responses were assessed by 3 blinded laryngologists with the artificial intelligence performance instrument (AIPI). The complexity of cases and the consistency between practitioners and ChatGPT-4 for interpreting clinical images were evaluated with a 5-point Likert Scale. The intraclass correlation coefficient (ICC) was used to measure the strength of interrater agreement.
RESULTS: Forty patients with a mean complexity score of 2.60 ± 1.15. were included. The mean consistency score for ChatGPT-4 image interpretation was 2.46 ± 1.42. ChatGPT-4 perfectly analyzed the clinical images in 6 cases (15%; 5/5), while the consistency between GPT-4 and judges was high in 5 cases (12.5%; 4/5). Judges reported an ICC of 0.965 for the consistency score (P = .001). ChatGPT-4 erroneously documented vocal fold irregularity (mass or lesion), glottic insufficiency, and vocal cord paralysis in 21 (52.5%), 2 (0.05%), and 5 (12.5%) cases, respectively. ChatGPT-4 and practitioners indicated 153 and 63 additional examinations, respectively (P = .001). The ChatGPT-4 primary diagnosis was correct in 20.0% to 25.0% of cases. The clinical image consistency score was significantly associated with the AIPI score (rs = 0.830; P = .001).
CONCLUSIONS: The ChatGPT-4 is more efficient in primary diagnosis, rather than in the image analysis, selecting the most adequate additional examinations and treatments.

摘要：

目的：研究Chatbot生成预训练变压器（ChatGPT）-4在常见喉科疾病临床图片分析中的一致性。
方法：前瞻性非对照研究。
方法：多中心研究。
方法：将患者病史和临床视频喉镜图像提供给ChatGPT-4进行鉴别诊断，管理,和治疗（S）。ChatGPT-4反应由3名盲喉科医师使用人工智能性能仪器（AIPI）进行评估。使用5点Likert量表评估了病例的复杂性以及从业人员与ChatGPT-4之间解释临床图像的一致性。使用组内相关系数（ICC）来衡量评估者之间的一致性强度。
结果：40例患者，平均复杂性评分为2.60±1.15。包括在内。ChatGPT-4图像解释的平均一致性评分为2.46±1.42。ChatGPT-4完美分析了6例（15％；5/5）的临床图像，而GPT-4和法官之间的一致性在5个案例中很高（12.5％；4/5）。法官报告的一致性得分ICC为0.965（P=.001）。ChatGPT-4错误地记录了声带不规则性（肿块或病变），声门功能不全,和声带麻痹21（52.5%），2（0.05%），和5例(12.5%)，分别。ChatGPT-4和从业人员进行了153和63次额外检查，分别(P=.001)。在20.0％至25.0％的病例中，ChatGPT-4的主要诊断是正确的。临床图像一致性评分与AIPI评分显著相关(rs=0.830;P=.001)。
结论：ChatGPT-4在主要诊断中更有效，而不是在图像分析中，选择最适当的额外检查和治疗。