关键词: chatgpt in healthcare chatgpt-4 lab interpretation large language model medication management retrospective comparative study rural setting

来  源:   DOI:10.7759/cureus.55789   PDF(Pubmed)

Abstract:
Background With ChatGPT demonstrating impressive abilities in solving clinical vignettes and medical questions, there is still a lack of studies assessing ChatGPT using real patient data. With real-world cases offering added complexity, ChatGPT\'s utility in treatment using such data must be tested to better assess its accuracy and dependability. In this study, we compared a rural cardiologist\'s medication recommendations to that of GPT-4 for patients with lab review appointments. Methodology We reviewed the lab review appointments of 40 hypertension patients, noting their age, sex, medical conditions, medications and dosage, and current and past lab values. The cardiologist\'s medication recommendations (decreasing dose, increasing dose, stopping, or adding medications) from the most recent lab visit, if any, were recorded for each patient. Data collected from each patient was inputted into GPT-4 using a set prompt and the resulting medication recommendations from the model were recorded. Results Out of the 40 patients, 95% had conflicting overall recommendations between the physician and GPT-4, with only 10.2% of the specific medication recommendations matching between the two. Cohen\'s kappa coefficient was -0.0127, indicating no agreement between the cardiologist and GPT-4 for providing medication changes overall for a patient. Possible reasons for this discrepancy can be differing optimal lab value ranges, lack of holistic analysis by GPT-4, and a need for providing further supplementary information to the model. Conclusions The study findings showed a significant difference between the cardiologist\'s medication recommendations and that of ChatGPT-4. Future research should continue to test GPT-4 in clinical settings to validate its abilities in the real world where more intricacies and challenges exist.
摘要:
ChatGPT在解决临床小插曲和医学问题方面表现出令人印象深刻的能力,仍然缺乏使用真实患者数据评估ChatGPT的研究.现实世界的案例增加了复杂性,必须测试ChatGPT在使用此类数据进行治疗中的实用性,以更好地评估其准确性和可靠性。在这项研究中,我们将农村心脏病专家的用药建议与GPT-4进行实验室检查的患者的用药建议进行了比较.方法我们回顾了40例高血压患者的实验室回顾预约,注意到他们的年龄,性别,医疗条件,药物和剂量,以及当前和过去的实验室值。心脏病专家的用药建议(减少剂量,增加剂量,停止,或添加药物)从最近的实验室访问中,如果有的话,对每位患者进行记录。使用设定提示将从每个患者收集的数据输入到GPT-4中,并记录来自模型的所得药物建议。结果40例患者中,95%的医生和GPT-4之间的总体建议相互矛盾,只有10.2%的特定药物建议在两者之间匹配。Cohen的kappa系数为-0.0127,表明心脏病学家和GPT-4在为患者提供总体药物变化方面没有达成一致。这种差异的可能原因可能是不同的最佳实验室值范围,GPT-4缺乏整体分析,需要对模型提供进一步的补充信息。结论研究结果显示心脏病专家的用药建议与ChatGPT-4的用药建议之间存在显著差异。未来的研究应该继续在临床环境中测试GPT-4,以验证其在现实世界中存在更多复杂性和挑战的能力。
公众号