用于耳科患者诊断的人工智能：准备好成为您的医生了吗？Artificial Intelligence for Diagnosis in Otologic Patients: Is It Ready to Be Your Doctor?-医云文献数字医云科研云海量医学决策数据服务

Abstract：

OBJECTIVE: Investigate the precision of language-model artificial intelligence (AI) in diagnosing conditions by contrasting its predictions with diagnoses made by board-certified otologic/neurotologic surgeons using patient-described symptoms.
METHODS: Prospective cohort study.
METHODS: Tertiary care center.
METHODS: One hundred adults participated in the study. These included new patients or established patients returning with new symptoms. Individuals were excluded if they could not provide a written description of their symptoms.
METHODS: Summaries of the patient\'s symptoms were supplied to three publicly available AI platforms: Chat GPT 4.0, Google Bard, and WebMD \"Symptom Checker.\"
METHODS: This study evaluates the accuracy of three distinct AI platforms in diagnosing otologic conditions by comparing AI results with the diagnosis determined by a neurotologist with the same information provided to the AI platforms and again after a complete history and physical examination.
RESULTS: The study includes 100 patients (52 men and 48 women; average age of 59.2 yr). Fleiss\' kappa between AI and the physician is -0.103 (p < 0.01). The chi-squared test between AI and the physician is χ2 = 12.95 (df = 2; p < 0.001). Fleiss\' kappa between AI models is 0.409. Diagnostic accuracies are 22.45, 12.24, and 5.10% for ChatGPT 4.0, Google Bard, and WebMD, respectively.
CONCLUSIONS: Contemporary language-model AI platforms can generate extensive differential diagnoses with limited data input. However, doctors can refine these diagnoses through focused history-taking, physical examinations, and clinical experience-skills that current AI platforms lack.

摘要：

目的：通过将语言模型人工智能（AI）的预测与董事会认证的耳科/神经外科医生使用患者描述的症状进行的诊断进行对比，来研究其在诊断疾病中的准确性。
方法：前瞻性队列研究。
方法：三级护理中心。
方法：100名成年人参与了这项研究。这些包括新患者或有新症状的确诊患者。如果个人无法提供其症状的书面描述，则将其排除在外。
方法：将患者症状摘要提供给三个公开的AI平台：ChatGPT4.0，GoogleBard，和WebMD\"症状检查器。\"
方法：本研究通过将AI结果与神经科医师确定的诊断结果进行比较，评估了三种不同的AI平台在诊断耳科疾病中的准确性，并将相同的信息提供给AI平台，然后再进行完整的病史和体格检查。
结果：该研究包括100名患者（52名男性和48名女性；平均年龄为59.2岁）。AI和医生之间的Fleiss\'kappa为-0.103（p<0.01）。AI和医生之间的卡方检验为χ2=12.95(df=2;p<0.001)。AI模型之间的Fleiss\'kappa为0.409。ChatGPT4.0、GoogleBard的诊断准确率分别为22.45、12.24和5.10%，和WebMD,分别。
结论：当代语言模型AI平台可以在有限的数据输入下生成广泛的鉴别诊断。然而,医生可以通过集中的病史记录来完善这些诊断，体检,以及当前AI平台缺乏的临床经验技能。