AI chatbot

ai 聊天机器人
  • 文章类型: Journal Article
    背景:人工智能聊天机器人产生的鉴别诊断的诊断准确性,包括ChatGPT模型,对于来自普通内科(GIM)部门的复杂临床小插曲,病例报告未知。
    目的:本研究旨在通过使用来自Dokkyo医科大学附属医院GIM部门发表的病例报告的病例小插图,评估第三代ChatGPT(ChatGPT-3.5)和第四代ChatGPT(ChatGPT-4)产生的鉴别诊断列表的准确性。日本。
    方法:我们搜索了PubMed的病例报告。识别后,医生选择诊断病例,确定最终诊断,并将它们展示在临床小插曲中。医师在ChatGPT-3.5和ChatGPT-4提示中用临床插图键入确定的文本以生成前10个鉴别诊断。ChatGPT模型没有经过专门训练或进一步加强。来自其他医疗机构的三位GIM医生通过阅读相同的临床插图来创建差异诊断列表。我们测量了前10个鉴别诊断列表中的正确诊断率,前5个鉴别诊断列表,和顶级诊断。
    结果:总计,对52例病例报告进行分析。ChatGPT-4在前10个鉴别诊断列表中的正确诊断率,前5个鉴别诊断列表,最高诊断为83%(43/52),81%(42/52),和60%(31/52),分别。ChatGPT-3.5在前10个鉴别诊断列表中的正确诊断率,前5个鉴别诊断列表,最高诊断为73%(38/52),65%(34/52),和42%(22/52),分别。ChatGPT-4的正确诊断率与前10名医生的正确诊断率相当(43/52,83%vs39/52,75%,分别为;P=0.47)和前五名(42/52,81%vs35/52,67%,分别;P=.18)鉴别诊断列表和最高诊断(31/52,60%vs26/52,50%,分别;P=.43),尽管差异不显著。ChatGPT模型的诊断准确性根据开放访问状态或发布日期(2011年之前与2022年之前)没有显着差异。
    结论:本研究证明了使用ChatGPT-3.5和ChatGPT-4生成的鉴别诊断列表对来自GIM部门发布的病例报告的复杂临床观察的潜在诊断准确性。ChatGPT-4产生的前10名和前5名鉴别诊断列表中的正确诊断率超过80%。尽管来自单个部门的有限病例报告数据集,我们的发现强调了ChatGPT-4作为医生补充工具的潜在效用,特别是那些隶属于GIM部门的人。进一步的调查应通过使用培训数据之外的不同案例材料来探索ChatGPT的诊断准确性。这些努力将全面了解人工智能在增强临床决策中的作用。
    BACKGROUND: The diagnostic accuracy of differential diagnoses generated by artificial intelligence chatbots, including ChatGPT models, for complex clinical vignettes derived from general internal medicine (GIM) department case reports is unknown.
    OBJECTIVE: This study aims to evaluate the accuracy of the differential diagnosis lists generated by both third-generation ChatGPT (ChatGPT-3.5) and fourth-generation ChatGPT (ChatGPT-4) by using case vignettes from case reports published by the Department of GIM of Dokkyo Medical University Hospital, Japan.
    METHODS: We searched PubMed for case reports. Upon identification, physicians selected diagnostic cases, determined the final diagnosis, and displayed them into clinical vignettes. Physicians typed the determined text with the clinical vignettes in the ChatGPT-3.5 and ChatGPT-4 prompts to generate the top 10 differential diagnoses. The ChatGPT models were not specially trained or further reinforced for this task. Three GIM physicians from other medical institutions created differential diagnosis lists by reading the same clinical vignettes. We measured the rate of correct diagnosis within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and the top diagnosis.
    RESULTS: In total, 52 case reports were analyzed. The rates of correct diagnosis by ChatGPT-4 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 83% (43/52), 81% (42/52), and 60% (31/52), respectively. The rates of correct diagnosis by ChatGPT-3.5 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 73% (38/52), 65% (34/52), and 42% (22/52), respectively. The rates of correct diagnosis by ChatGPT-4 were comparable to those by physicians within the top 10 (43/52, 83% vs 39/52, 75%, respectively; P=.47) and within the top 5 (42/52, 81% vs 35/52, 67%, respectively; P=.18) differential diagnosis lists and top diagnosis (31/52, 60% vs 26/52, 50%, respectively; P=.43) although the difference was not significant. The ChatGPT models\' diagnostic accuracy did not significantly vary based on open access status or the publication date (before 2011 vs 2022).
    CONCLUSIONS: This study demonstrates the potential diagnostic accuracy of differential diagnosis lists generated using ChatGPT-3.5 and ChatGPT-4 for complex clinical vignettes from case reports published by the GIM department. The rate of correct diagnoses within the top 10 and top 5 differential diagnosis lists generated by ChatGPT-4 exceeds 80%. Although derived from a limited data set of case reports from a single department, our findings highlight the potential utility of ChatGPT-4 as a supplementary tool for physicians, particularly for those affiliated with the GIM department. Further investigations should explore the diagnostic accuracy of ChatGPT by using distinct case materials beyond its training data. Such efforts will provide a comprehensive insight into the role of artificial intelligence in enhancing clinical decision-making.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景本研究旨在评估ChatGPT的疗效,先进的自然语言处理模型,通过比较和对比不同的指南来源来适应和综合糖尿病酮症酸中毒(DKA)的临床指南。方法我们采用了全面的比较方法,并检查了三个著名的指南来源:加拿大糖尿病临床实践指南专家委员会(2018),初级保健中高血糖的应急管理,联合英国糖尿病协会(JBDS)02成人糖尿病酮症酸中毒的管理。数据提取侧重于诊断标准,危险因素,症状和体征,调查,和治疗建议。我们比较了ChatGPT生成的综合指南,并确定了任何误报或未报告的错误。结果ChatGPT能够生成比较指南的综合表格。然而,多个反复出现的错误,包括误报和未报告错误,被确认,使结果不可靠。此外,在重复报告数据中观察到不一致.该研究强调了使用ChatGPT在没有专家人工干预的情况下适应临床指南的局限性。结论虽然ChatGPT证明了临床指南合成的潜力,多次反复出现的错误和不一致现象的存在凸显了专家人工干预和验证的必要性.未来的研究应该集中在提高ChatGPT的准确性和可靠性上,以及探索其在临床实践和指南开发其他领域的潜在应用。
    Background This study aimed to evaluate the efficacy of ChatGPT, an advanced natural language processing model, in adapting and synthesizing clinical guidelines for diabetic ketoacidosis (DKA) by comparing and contrasting different guideline sources. Methodology We employed a comprehensive comparison approach and examined three reputable guideline sources: Diabetes Canada Clinical Practice Guidelines Expert Committee (2018), Emergency Management of Hyperglycaemia in Primary Care, and Joint British Diabetes Societies (JBDS) 02 The Management of Diabetic Ketoacidosis in Adults. Data extraction focused on diagnostic criteria, risk factors, signs and symptoms, investigations, and treatment recommendations. We compared the synthesized guidelines generated by ChatGPT and identified any misreporting or non-reporting errors. Results ChatGPT was capable of generating a comprehensive table comparing the guidelines. However, multiple recurrent errors, including misreporting and non-reporting errors, were identified, rendering the results unreliable. Additionally, inconsistencies were observed in the repeated reporting of data. The study highlights the limitations of using ChatGPT for the adaptation of clinical guidelines without expert human intervention. Conclusions Although ChatGPT demonstrates the potential for the synthesis of clinical guidelines, the presence of multiple recurrent errors and inconsistencies underscores the need for expert human intervention and validation. Future research should focus on improving the accuracy and reliability of ChatGPT, as well as exploring its potential applications in other areas of clinical practice and guideline development.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号