ChatGPT 3.5 Copilot 和 Gemini 在解释生化实验室数据方面的响应准确性是一项初步研究。Response accuracy of ChatGPT 3.5 Copilot and Gemini in interpreting biochemical laboratory data a pilot study.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

With the release of ChatGPT at the end of 2022, a new era of thinking and technology use has begun. Artificial intelligence models (AIs) like Gemini (Bard), Copilot (Bing), and ChatGPT-3.5 have the potential to impact every aspect of our lives, including laboratory data interpretation. To assess the accuracy of ChatGPT-3.5, Copilot, and Gemini responses in evaluating biochemical data. Ten simulated patients\' biochemical laboratory data, including serum urea, creatinine, glucose, cholesterol, triglycerides, low-density lipoprotein (LDL-c), and high-density lipoprotein (HDL-c), in addition to HbA1c, were interpreted by three AIs: Copilot, Gemini, and ChatGPT-3.5, followed by evaluation with three raters. The study was carried out using two approaches. The first encompassed all biochemical data. The second contained only kidney function data. The first approach indicated Copilot to have the highest level of accuracy, followed by Gemini and ChatGPT-3.5. Friedman and Dunn\'s post-hoc test revealed that Copilot had the highest mean rank; the pairwise comparisons revealed significant differences for Copilot vs. ChatGPT-3.5 (P = 0.002) and Gemini (P = 0.008). The second approach exhibited Copilot to have the highest accuracy of performance. The Friedman test with Dunn\'s post-hoc analysis showed Copilot to have the highest mean rank. The Wilcoxon Signed-Rank Test demonstrated an indistinguishable response (P = 0.5) of Copilot when all laboratory data were applied vs. the application of only kidney function data. Copilot is more accurate in interpreting biochemical data than Gemini and ChatGPT-3.5. Its consistent responses across different data subsets highlight its reliability in this context.

摘要：

随着2022年底ChatGPT的发布，思维和技术使用的新时代已经开始。人工智能模型(AI)，如双子座(Bard)，副驾驶(Bing),ChatGPT-3.5有可能影响我们生活的方方面面，包括实验室数据解释。要评估ChatGPT-3.5的准确性，和双子座在评估生化数据时的反应。十个模拟病人的生化实验室数据,包括血清尿素,肌酐,葡萄糖，胆固醇,甘油三酯，低密度脂蛋白（LDL-c），高密度脂蛋白(HDL-c),除了HbA1c,由三个人工智能解释：副驾驶，双子座,和ChatGPT-3.5，然后与三名评估者进行评估。该研究使用两种方法进行。第一个包含所有生化数据。第二个仅包含肾功能数据。第一种方法表明副驾驶具有最高的准确度，其次是双子座和ChatGPT-3.5。Friedman和Dunn的事后检验表明，Copilot的平均排名最高；成对比较显示，Copilot与ChatGPT-3.5(P=0.002)和双子(P=0.008)。第二种方法表现出Copilot具有最高的性能准确性。弗里德曼测试与邓恩的事后分析显示，Copilot具有最高的平均排名。WilcoxonSigned-Rank测试表明，当应用所有实验室数据时，Copilot的反应（P=0.5）与仅应用肾功能数据。Copilot在解释生化数据方面比Gemini和ChatGPT-3.5更准确。它在不同数据子集之间的一致响应突出了它在这种情况下的可靠性。