关键词: ChatGPT atypical presentation common disease diagnosis diagnostic accuracy patient safety

Mesh : Humans Diagnosis, Differential Artificial Intelligence Diagnostic Errors / statistics & numerical data prevention & control

来  源:   DOI:10.2196/58758   PDF(Pubmed)

Abstract:
UNASSIGNED: The persistence of diagnostic errors, despite advances in medical knowledge and diagnostics, highlights the importance of understanding atypical disease presentations and their contribution to mortality and morbidity. Artificial intelligence (AI), particularly generative pre-trained transformers like GPT-4, holds promise for improving diagnostic accuracy, but requires further exploration in handling atypical presentations.
UNASSIGNED: This study aimed to assess the diagnostic accuracy of ChatGPT in generating differential diagnoses for atypical presentations of common diseases, with a focus on the model\'s reliance on patient history during the diagnostic process.
UNASSIGNED: We used 25 clinical vignettes from the Journal of Generalist Medicine characterizing atypical manifestations of common diseases. Two general medicine physicians categorized the cases based on atypicality. ChatGPT was then used to generate differential diagnoses based on the clinical information provided. The concordance between AI-generated and final diagnoses was measured, with a focus on the top-ranked disease (top 1) and the top 5 differential diagnoses (top 5).
UNASSIGNED: ChatGPT\'s diagnostic accuracy decreased with an increase in atypical presentation. For category 1 (C1) cases, the concordance rates were 17% (n=1) for the top 1 and 67% (n=4) for the top 5. Categories 3 (C3) and 4 (C4) showed a 0% concordance for top 1 and markedly lower rates for the top 5, indicating difficulties in handling highly atypical cases. The χ2 test revealed no significant difference in the top 1 differential diagnosis accuracy between less atypical (C1+C2) and more atypical (C3+C4) groups (χ²1=2.07; n=25; P=.13). However, a significant difference was found in the top 5 analyses, with less atypical cases showing higher accuracy (χ²1=4.01; n=25; P=.048).
UNASSIGNED: ChatGPT-4 demonstrates potential as an auxiliary tool for diagnosing typical and mildly atypical presentations of common diseases. However, its performance declines with greater atypicality. The study findings underscore the need for AI systems to encompass a broader range of linguistic capabilities, cultural understanding, and diverse clinical scenarios to improve diagnostic utility in real-world settings.
摘要:
诊断错误的持久性,尽管在医学知识和诊断方面取得了进步,强调了解非典型疾病表现及其对死亡率和发病率的贡献的重要性.人工智能(AI)特别是像GPT-4这样的生成式预训练变压器,有望提高诊断准确性,但在处理非典型演示文稿方面需要进一步探索。
本研究旨在评估ChatGPT在对常见疾病的非典型表现进行鉴别诊断时的诊断准确性,重点关注模型在诊断过程中对患者病史的依赖。
我们使用了《通才医学杂志》上的25个临床小插曲,描述了常见疾病的非典型表现。两名全科医生根据非典型性对病例进行了分类。然后根据所提供的临床信息使用ChatGPT进行鉴别诊断。测量了人工智能生成和最终诊断之间的一致性,重点关注排名靠前的疾病(前1名)和前5名的鉴别诊断(前5名)。
ChatGPT的诊断准确性随着非典型表现的增加而降低。对于第1类(C1)案例,前1名的一致率为17%(n=1),前5名的一致率为67%(n=4)。类别3(C3)和类别4(C4)显示前1名的一致性为0%,前5名的比率明显较低,这表明处理高度非典型病例存在困难。χ2检验显示,不典型组(C1C2)和不典型组(C3C4)之间的前1个鉴别诊断准确性没有显着差异(χ²1=2.07;n=25;P=.13)。然而,在前5项分析中发现了显著差异,非典型病例较少,准确性较高(χ²1=4.01;n=25;P=0.048)。
ChatGPT-4显示出作为诊断常见疾病的典型和轻度非典型表现的辅助工具的潜力。然而,其业绩下降的非典型性更大。研究结果强调,人工智能系统需要包含更广泛的语言能力。文化理解,和不同的临床场景,以提高在现实世界中的诊断效用。
公众号