ChatGPT 提供的医疗信息的可靠性：针对临床指南和患者信息质量工具的评估。Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

ChatGPT-4 is the latest release of a novel artificial intelligence (AI) chatbot able to answer freely formulated and complex questions. In the near future, ChatGPT could become the new standard for health care professionals and patients to access medical information. However, little is known about the quality of medical information provided by the AI.
We aimed to assess the reliability of medical information provided by ChatGPT.
Medical information provided by ChatGPT-4 on the 5 hepato-pancreatico-biliary (HPB) conditions with the highest global disease burden was measured with the Ensuring Quality Information for Patients (EQIP) tool. The EQIP tool is used to measure the quality of internet-available information and consists of 36 items that are divided into 3 subsections. In addition, 5 guideline recommendations per analyzed condition were rephrased as questions and input to ChatGPT, and agreement between the guidelines and the AI answer was measured by 2 authors independently. All queries were repeated 3 times to measure the internal consistency of ChatGPT.
Five conditions were identified (gallstone disease, pancreatitis, liver cirrhosis, pancreatic cancer, and hepatocellular carcinoma). The median EQIP score across all conditions was 16 (IQR 14.5-18) for the total of 36 items. Divided by subsection, median scores for content, identification, and structure data were 10 (IQR 9.5-12.5), 1 (IQR 1-1), and 4 (IQR 4-5), respectively. Agreement between guideline recommendations and answers provided by ChatGPT was 60% (15/25). Interrater agreement as measured by the Fleiss κ was 0.78 (P<.001), indicating substantial agreement. Internal consistency of the answers provided by ChatGPT was 100%.
ChatGPT provides medical information of comparable quality to available static internet information. Although currently of limited quality, large language models could become the future standard for patients and health care professionals to gather medical information.

摘要：

背景：ChatGPT-4是一种新颖的人工智能（AI）聊天机器人的最新版本，能够回答自由制定和复杂的问题。在不久的将来,ChatGPT可能成为医疗保健专业人员和患者获取医疗信息的新标准。然而,人们对人工智能提供的医疗信息的质量知之甚少。
目的：我们旨在评估ChatGPT提供的医疗信息的可靠性。
方法：ChatGPT-4提供的关于全球疾病负担最高的肝胰胆(HPB)疾病的医学信息是用确保患者质量信息(EQIP)工具测量的。EQIP工具用于衡量互联网可用信息的质量，由36个项目组成，分为3个子部分。此外,每个分析条件的5个指南建议被重新表述为问题和ChatGPT的输入，指南和AI答案之间的一致性由2位作者独立衡量。将所有查询重复3次以测量ChatGPT的内部一致性。
结果：确定了五个条件（胆结石病，胰腺炎,肝硬化,胰腺癌,和肝细胞癌）。所有条件下的EQIP评分中位数为16（IQR14.5-18），共36个项目。按小节划分，内容的中位数分数，identification,结构数据为10(IQR9.5-12.5)，1（IQR1-1），和4（IQR4-5），分别。指南建议与ChatGPT提供的答案之间的一致性为60%(15/25)。Fleissκ测量的评分者间一致性为0.78(P<.001)，表明实质性的协议。ChatGPT提供的答案的内部一致性为100%。
结论：ChatGPT提供的医疗信息质量与现有的静态互联网信息相当。虽然目前质量有限，大型语言模型可能成为患者和医疗保健专业人员收集医疗信息的未来标准。