chatgpt ChatGPT-医云文献数字医云科研云海量医学决策数据服务

ChatGPT 关注

chatgpt

文献(1424篇)

百科

视频

1 Can large language models provide accurate and quality information to parents regarding chronic kidney diseases?

影响指数 : 2.336
发表时间：Jul 2024 3
来源期刊：J Eval Clin Pract PMID：38959373

DOI：10.1111/jep.14084
文章类型： Journal Article

BACKGROUND: Artificial Intelligence (AI) large language models (LLM) are tools capable of generating human-like text responses to user queries across topics. The use of these language models in various medical contexts is currently being studied. However, the performance and content quality of these language models have not been evaluated in specific medical fields.
OBJECTIVE: This study aimed to compare the performance of AI LLMs ChatGPT, Gemini and Copilot in providing information to parents about chronic kidney diseases (CKD) and compare the information accuracy and quality with that of a reference source.
METHODS: In this study, 40 frequently asked questions about CKD were identified. The accuracy and quality of the answers were evaluated with reference to the Kidney Disease: Improving Global Outcomes guidelines. The accuracy of the responses generated by LLMs was assessed using F1, precision and recall scores. The quality of the responses was evaluated using a five-point global quality score (GQS).
RESULTS: ChatGPT and Gemini achieved high F1 scores of 0.89 and 1, respectively, in the diagnosis and lifestyle categories, demonstrating significant success in generating accurate responses. Furthermore, ChatGPT and Gemini were successful in generating accurate responses with high precision values in the diagnosis and lifestyle categories. In terms of recall values, all LLMs exhibited strong performance in the diagnosis, treatment and lifestyle categories. Average GQ scores for the responses generated were 3.46 ± 0.55, 1.93 ± 0.63 and 2.02 ± 0.69 for Gemini, ChatGPT 3.5 and Copilot, respectively. In all categories, Gemini performed better than ChatGPT and Copilot.
CONCLUSIONS: Although LLMs provide parents with high-accuracy information about CKD, their use is limited compared with that of a reference source. The limitations in the performance of LLMs can lead to misinformation and potential misinterpretations. Therefore, patients and parents should exercise caution when using these models.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
2 ChatGPT: perspectives from human-computer interaction and psychology.

ChatGPT ：来自人机交互和心理学的观点。影响指数 : 暂无
发表时间：2024
来源期刊：Front Artif Intell PMID：38957452

DOI：10.3389/frai.2024.1418869
文章类型： Journal Article

GPT-4的发布引起了各个领域的广泛关注，信号即将广泛采用和应用大型语言模型(LLM)。然而,以前的研究主要集中在ChatGPT的技术原理及其社会影响上，忽视了它对人机交互和用户心理的影响。本文探讨了ChatGPT对人机交互的多方面影响，心理学,和社会通过文献综述。作者调查了ChatGPT的技术基础，包括其Transformer架构和RLHF（来自人类反馈的强化学习）过程，使它能够产生类似人类的反应。在人机交互方面，作者研究了GPT模型给会话界面带来的重大改进。分析延伸到心理影响，权衡ChatGPT模仿人类同理心和支持学习的潜力，以减少人际关系的风险。在商业和社会领域，本文讨论了ChatGPT在客户服务和社会服务中的应用，强调效率的提高和隐私问题等挑战。最后,作者对ChatGPT的未来发展方向及其对社会关系的影响提供了预测和建议。
The release of GPT-4 has garnered widespread attention across various fields, signaling the impending widespread adoption and application of Large Language Models (LLMs). However, previous research has predominantly focused on the technical principles of ChatGPT and its social impact, overlooking its effects on human-computer interaction and user psychology. This paper explores the multifaceted impacts of ChatGPT on human-computer interaction, psychology, and society through a literature review. The author investigates ChatGPT\'s technical foundation, including its Transformer architecture and RLHF (Reinforcement Learning from Human Feedback) process, enabling it to generate human-like responses. In terms of human-computer interaction, the author studies the significant improvements GPT models bring to conversational interfaces. The analysis extends to psychological impacts, weighing the potential of ChatGPT to mimic human empathy and support learning against the risks of reduced interpersonal connections. In the commercial and social domains, the paper discusses the applications of ChatGPT in customer service and social services, highlighting the improvements in efficiency and challenges such as privacy issues. Finally, the author offers predictions and recommendations for ChatGPT\'s future development directions and its impact on social relationships.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
3 Can chatbots enhance the management of pediatric sialadenitis in clinical practice?

聊天机器人可以在临床实践中加强小儿唾液腺炎的管理吗？影响指数 : 3.236
发表时间：Jul 2024 2
来源期刊：Eur Arch Otorhinolaryngol PMID：38955859

DOI：10.1007/s00405-024-08798-4
文章类型： Journal Article

目的：本研究的目的是评估ChatGPT的水平，一个人工智能驱动的聊天机器人，在帮助治疗小儿唾液腺炎和确定何时需要进行鼻内镜检查时进行。
方法:对49例小儿涎腺炎的临床资料进行回顾性分析。ChatGPT被给予患者数据，它提供了鉴别诊断，提出了进一步的测试，并建议治疗。治疗的耳鼻喉科医生做出的决定与ChatGPT提供的答案进行了对比。对ChatGPT响应一致性和评分者间可靠性进行了分析。
结果：ChatGPT在主要诊断中显示78.57％的准确率，17.35%的病例被认为是可能的。另一方面,耳鼻喉科医师推荐的进一步检查比ChatGPT少（111vs.60，p<0.001)。对于额外的考试，ChatGPT和耳鼻喉科医师之间的一致性较差。只有28.57％的病例通过ChatGPT接受了相关和必要的治疗计划，这表明该平台的治疗建议经常缺乏。对于治疗评级，法官之间的可靠性最高(肯德尔的tau=0.824，p<0.001)。在大多数情况下,ChatGPT的反应恒定性很高。
结论：尽管ChatGPT有可能正确诊断小儿涎腺炎，关于其建议进一步检测和治疗方案的能力,存在许多值得注意的局限性.在广泛临床使用之前，需要更多的研究和确认。为了保证聊天机器人得到适当和有效的利用，以补充人类的专业知识，而不是取代它。需要一个批判性的观点。
OBJECTIVE: The purpose of this study was to assess how well ChatGPT, an AI-powered chatbot, performed in helping to manage pediatric sialadenitis and identify when sialendoscopy was necessary.
METHODS: 49 clinical cases of pediatric sialadenitis were retrospectively reviewed. ChatGPT was given patient data, and it offered differential diagnoses, proposed further tests, and suggested treatments. The decisions made by the treating otolaryngologists were contrasted with the answers provided by ChatGPT. Analysis was done on ChatGPT response consistency and interrater reliability.
RESULTS: ChatGPT showed 78.57% accuracy in primary diagnosis, and 17.35% of cases were considered likely. On the other hand, otolaryngologists recommended fewer further examinations than ChatGPT (111 vs. 60, p < 0.001). For additional exams, poor agreement was found between ChatGPT and otolaryngologists. Only 28.57% of cases received a pertinent and essential treatment plan via ChatGPT, indicating that the platform\'s treatment recommendations were frequently lacking. For treatment ratings, judges\' interrater reliability was greatest (Kendall\'s tau = 0.824, p < 0.001). For the most part, ChatGPT\'s response constancy was high.
CONCLUSIONS: Although ChatGPT has the potential to correctly diagnose pediatric sialadenitis, there are a number of noteworthy limitations with regard to its ability to suggest further testing and treatment regimens. Before widespread clinical use, more research and confirmation are required. To guarantee that chatbots are utilized properly and effectively to supplement human expertise rather than to replace it, a critical viewpoint is required.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
4 Analyzing Large Language Models' Responses to Common Lumbar Spine Fusion Surgery Questions: A Comparison Between ChatGPT and Bard.

分析大型语言模型对常见腰椎融合手术问题的反应： ChatGPT 和 Bard 之间的比较。影响指数 : 3.374
发表时间：Jun 2024
来源期刊：Neurospine PMID：38955533

DOI：10.14245/ns.2448098.049
文章类型： Journal Article

目标：在数字时代，患者转向在线来源获取腰椎融合信息，需要仔细研究大型语言模型（LLM），例如用于患者教育的聊天生成预训练变压器（ChatGPT）。
方法：我们的研究旨在评估OpenAI（人工智能）的ChatGPT3.5和Google的Bard对腰椎融合手术患者问题的响应质量。我们通过谷歌搜索从158个常见问题中找出了10个关键问题，然后将其呈现给两个聊天机器人。五名失明的脊柱外科医生以4分制对反应进行了评分，从“不满意”到“优秀”。\'答案的清晰度和专业性也使用5点李克特量表进行了评估。
结果：在我们对ChatGPT3.5和Bard的10个问题的评估中，97%的反应被评为优秀或令人满意。具体来说,ChatGPT有62%的优秀和32%的最低澄清反应，只有6%需要适度或实质性的澄清。巴德的回答是66%的优秀和24%的最低澄清，10%需要更多的澄清。2个模型之间的总体评级分布没有发现显着差异。两人都在努力解决关于手术风险的3个具体问题，成功率,以及手术入路的选择（Q3，Q4和Q5）。两种模型的评分者间可靠性均较低（ChatGPT：k=0.041，p=0.622；Bard：k=-0.040，p=0.601）。虽然两人在理解和同理心上都得分很高，吟游诗人在同理心和专业精神方面的评分略低。
结论：ChatGPT3.5和Bard有效回答了腰椎融合常见问题，但需要进一步的培训和研究来巩固LLM在医学教育和医疗保健沟通中的作用。
OBJECTIVE: In the digital age, patients turn to online sources for lumbar spine fusion information, necessitating a careful study of large language models (LLMs) like chat generative pre-trained transformer (ChatGPT) for patient education.
METHODS: Our study aims to assess the response quality of Open AI (artificial intelligence)\'s ChatGPT 3.5 and Google\'s Bard to patient questions on lumbar spine fusion surgery. We identified 10 critical questions from 158 frequently asked ones via Google search, which were then presented to both chatbots. Five blinded spine surgeons rated the responses on a 4-point scale from \'unsatisfactory\' to \'excellent.\' The clarity and professionalism of the answers were also evaluated using a 5-point Likert scale.
RESULTS: In our evaluation of 10 questions across ChatGPT 3.5 and Bard, 97% of responses were rated as excellent or satisfactory. Specifically, ChatGPT had 62% excellent and 32% minimally clarifying responses, with only 6% needing moderate or substantial clarification. Bard\'s responses were 66% excellent and 24% minimally clarifying, with 10% requiring more clarification. No significant difference was found in the overall rating distribution between the 2 models. Both struggled with 3 specific questions regarding surgical risks, success rates, and selection of surgical approaches (Q3, Q4, and Q5). Interrater reliability was low for both models (ChatGPT: k = 0.041, p = 0.622; Bard: k = -0.040, p = 0.601). While both scored well on understanding and empathy, Bard received marginally lower ratings in empathy and professionalism.
CONCLUSIONS: ChatGPT3.5 and Bard effectively answered lumbar spine fusion FAQs, but further training and research are needed to solidify LLMs\' role in medical education and healthcare communication.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
5 Assessing ChatGPT's ability to emulate human reviewers in scientific research: A descriptive and qualitative approach.

评估 ChatGPT 在科学研究中模仿人类审稿人的能力：一种描述性和定性的方法。影响指数 : 7.027
发表时间：Jun 2024 28
来源期刊：Comput Methods Programs Biomed PMID：38954915

DOI：10.1016/j.cmpb.2024.108313
文章类型： Journal Article

背景：ChatGPT是一个AI平台，其在科学文章的同行评审中的相关性正在稳步增长。尽管如此，它引发了关于其潜在偏见和不准确性的辩论。这项研究旨在评估ChatGPT在科学研究中定性模仿人类评论者的能力。
方法：我们将2023年7月3日发表的最新20篇原创研究文章的第一篇提交版本纳入了一份备受瞩目的医学杂志。在最初的审查阶段，每篇文章都经过了至少三名审查人员的评估。随后，三名具有医学背景和手稿修订专业知识的研究人员，独立和定性地评估了ChatGPT版本GPT-4产生的同行评审与人类评审员对这些文章提供的评论之间的一致性.协议的级别分为完整，局部,无,或者矛盾。
结果：对720位人类评审员的评论进行了评估。三位评估员之间达成了很好的协议(总体kappa>0.6)。ChatGPT的评论在质量和实质方面与48位（6.7％）人类评论者的评论完全一致，部分同意92(12.8%)，确定需要进一步阐述或建议补充步骤以解决关切的问题，与重要的565（78.5%）没有协议，与15（2.1％）相矛盾。ChatGPT评论对方法的完全一致比例最低(13条评论，3.6%)，虽然对手稿的一般性评论显示完全同意的比例最高(17条评论，22.1%）。
结论：ChatGPT版本GPT-4在科学研究的同行评审过程中模仿人类评审者的能力有限。
BACKGROUND: ChatGPT is an AI platform whose relevance in the peer review of scientific articles is steadily growing. Nonetheless, it has sparked debates over its potential biases and inaccuracies. This study aims to assess ChatGPT\'s ability to qualitatively emulate human reviewers in scientific research.
METHODS: We included the first submitted version of the latest twenty original research articles published by the 3rd of July 2023, in a high-profile medical journal. Each article underwent evaluation by a minimum of three human reviewers during the initial review stage. Subsequently, three researchers with medical backgrounds and expertise in manuscript revision, independently and qualitatively assessed the agreement between the peer reviews generated by ChatGPT version GPT-4 and the comments provided by human reviewers for these articles. The level of agreement was categorized into complete, partial, none, or contradictory.
RESULTS: 720 human reviewers\' comments were assessed. There was a good agreement between the three assessors (Overall kappa >0.6). ChatGPT\'s comments demonstrated complete agreement in terms of quality and substance with 48 (6.7 %) human reviewers\' comments, partially agreed with 92 (12.8 %), identifying issues necessitating further elaboration or recommending supplementary steps to address concerns, had no agreement with a significant 565 (78.5 %), and contradicted 15 (2.1 %). ChatGPT comments on methods had the lowest proportion of complete agreement (13 comments, 3.6 %), while general comments on the manuscript displayed the highest proportion of complete agreement (17 comments, 22.1 %).
CONCLUSIONS: ChatGPT version GPT-4 has a limited ability to emulate human reviewers within the peer review process of scientific research.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
6 Impact Analysis of COVID-19 Pandemic on Hospital Reviews on Dianping Website in Shanghai, China: Empirical Study.

COVID - 19 大流行对上海大众点评网站医院点评的影响分析，中国：实证研究。影响指数 : 7.076
发表时间：Jul 2024 2
来源期刊：J Med Internet Res PMID：38954461

DOI：10.2196/52992
文章类型： Journal Article

背景：在互联网时代，个人越来越习惯于在公共网络平台上收集必要的信息和表达意见。医疗保健部门也不例外,作为这些评论，在某种程度上,影响人们的医疗保健决策。在COVID-19大流行爆发期间，中国患者的医疗经验和他们对医院的评价如何变化还有待研究。因此,我们计划从互联网上收集患者就诊数据，以反映特定情况下的医疗关系现状。
目的：本研究旨在探讨不同阶段患者评论的差异（在，之前，以及在COVID-19大流行之后），以及不同类型的医院(儿童医院，妇产医院,和肿瘤医院)。此外,通过利用ChatGPT(OpenAI)，该研究对医院负面评价的要素进行了分类。对采集的数据进行分析，并提出了可以提高患者满意度的潜在解决方案。这项研究旨在帮助医院管理者为在突发公共卫生危机中寻求护理的患者提供更好的体验。
方法：选择全国排名前50位的综合性医院和排名前50位的专科医院（儿童医院，肿瘤医院,和妇产医院)，我们在大众点评网站上收集了这些医院的患者评论。使用ChatGPT，我们对负面评论的内容进行了分类。此外,我们使用SPSS(IBM公司)进行了统计分析,以检查负面评价的评分和构成.
结果：从2018年1月1日至2023年8月15日，共收集有效评论信息30317条，其中负面评论信息7696条。手工检查结果表明，ChatGPT的准确率为92.05％。F1评分为0.914。对这些数据的分析表明，大流行期间医院收到的评论和评级之间存在显着相关性。总的来说,在爆发期间,平均评论评分显著增加(P<.001).此外，不同类型医院的负面评价构成差异有统计学意义(P<.001)。儿童医院收到了关于等待时间和治疗效果的敏感反馈，而妇产医院的患者对医疗保健提供者的态度表现出更大的关注。肿瘤医院的患者表示希望及时检查和治疗，特别是在大流行期间。
结论：COVID-19大流行与患者评论评分有一定关联。不同类型的专科医院之间的评分和评论内容存在差异。使用ChatGPT分析患者评论内容代表了一种用于统计评估导致患者不满的因素的创新方法。这项研究的结果可以为医院管理者提供有价值的见解，以促进更和谐的医患关系并在突发公共卫生事件中提高医院绩效。
BACKGROUND: In the era of the internet, individuals have increasingly accustomed themselves to gathering necessary information and expressing their opinions on public web-based platforms. The health care sector is no exception, as these comments, to a certain extent, influence people\'s health care decisions. During the onset of the COVID-19 pandemic, how the medical experience of Chinese patients and their evaluations of hospitals have changed remains to be studied. Therefore, we plan to collect patient medical visit data from the internet to reflect the current status of medical relationships under specific circumstances.
OBJECTIVE: This study aims to explore the differences in patient comments across various stages (during, before, and after) of the COVID-19 pandemic, as well as among different types of hospitals (children\'s hospitals, maternity hospitals, and tumor hospitals). Additionally, by leveraging ChatGPT (OpenAI), the study categorizes the elements of negative hospital evaluations. An analysis is conducted on the acquired data, and potential solutions that could improve patient satisfaction are proposed. This study is intended to assist hospital managers in providing a better experience for patients who are seeking care amid an emergent public health crisis.
METHODS: Selecting the top 50 comprehensive hospitals nationwide and the top specialized hospitals (children\'s hospitals, tumor hospitals, and maternity hospitals), we collected patient reviews from these hospitals on the Dianping website. Using ChatGPT, we classified the content of negative reviews. Additionally, we conducted statistical analysis using SPSS (IBM Corp) to examine the scoring and composition of negative evaluations.
RESULTS: A total of 30,317 pieces of effective comment information were collected from January 1, 2018, to August 15, 2023, including 7696 pieces of negative comment information. Manual inspection results indicated that ChatGPT had an accuracy rate of 92.05%. The F1-score was 0.914. The analysis of this data revealed a significant correlation between the comments and ratings received by hospitals during the pandemic. Overall, there was a significant increase in average comment scores during the outbreak (P<.001). Furthermore, there were notable differences in the composition of negative comments among different types of hospitals (P<.001). Children\'s hospitals received sensitive feedback regarding waiting times and treatment effectiveness, while patients at maternity hospitals showed a greater concern for the attitude of health care providers. Patients at tumor hospitals expressed a desire for timely examinations and treatments, especially during the pandemic period.
CONCLUSIONS: The COVID-19 pandemic had some association with patient comment scores. There were variations in the scores and content of comments among different types of specialized hospitals. Using ChatGPT to analyze patient comment content represents an innovative approach for statistically assessing factors contributing to patient dissatisfaction. The findings of this study could provide valuable insights for hospital administrators to foster more harmonious physician-patient relationships and enhance hospital performance during public health emergencies.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
7 GPT-based chatbot tools are still unreliable in the management of prosthetic joint infections.

基于 GPT 的聊天机器人工具在治疗人工关节感染方面仍然不可靠。影响指数 : 暂无
发表时间：Jul 2024 2
来源期刊：Musculoskelet Surg PMID：38954323

DOI：10.1007/s12306-024-00846-w
文章类型： Journal Article

背景：人工智能聊天机器人工具的反应可能会辨别出可能无法观察到人类的模式和相关性，导致更准确和及时的干预。然而,他们回答医疗保健相关问题的可靠性仍然存在争议。这项研究旨在评估三种基于GPT的聊天机器人关于人工关节感染（PJI）的性能。
方法：关于髋关节和膝关节PJIs的诊断和治疗的30个问题，由先验确定的困难分层，是由一个专家小组产生的，并管理到ChatGPT3.5，BingChat，和ChatGPT4.0。三位整形外科医生和两位传染病医生使用五点的Likert类量表对反应进行了评估，以量化反应的质量。通过类间相关性统计来评估评分者间的可靠性。
结果：所有被检查的聊天机器人的平均响应“好到非常好”，无论是诊断还是治疗,根据问题的难度没有显著差异。然而,BingChat评分在治疗设置中显著较低（p=0.025），特别是在准确性(p=0.02)和完整性(p=0.004)方面。考官之间的评分协议似乎很差。
结论：平均而言，答复质量得到专家的好评，但评级通常可能差异很大。目前，这表明AI聊天机器人工具在PJI的管理中仍然不可靠。
BACKGROUND: Artificial intelligence chatbot tools responses might discern patterns and correlations that may elude human observation, leading to more accurate and timely interventions. However, their reliability to answer healthcare-related questions is still debated. This study aimed to assess the performance of the three versions of GPT-based chatbots about prosthetic joint infections (PJI).
METHODS: Thirty questions concerning the diagnosis and treatment of hip and knee PJIs, stratified by a priori established difficulty, were generated by a team of experts, and administered to ChatGPT 3.5, BingChat, and ChatGPT 4.0. Responses were rated by three orthopedic surgeons and two infectious diseases physicians using a five-point Likert-like scale with numerical values to quantify the quality of responses. Inter-rater reliability was assessed by interclass correlation statistics.
RESULTS: Responses averaged \"good-to-very good\" for all chatbots examined, both in diagnosis and treatment, with no significant differences according to the difficulty of the questions. However, BingChat ratings were significantly lower in the treatment setting (p = 0.025), particularly in terms of accuracy (p = 0.02) and completeness (p = 0.004). Agreement in ratings among examiners appeared to be very poor.
CONCLUSIONS: On average, the quality of responses is rated positively by experts, but with ratings that frequently may vary widely. This currently suggests that AI chatbot tools are still unreliable in the management of PJI.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
8 Diagnostic performances of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro in "Diagnosis Please" cases.

GPT - 4o 的诊断性能，克劳德 3 号作品，和双子座 1.5 Pro 在 “请诊断 ” 的情况下。影响指数 : 2.701
发表时间：Jul 2024 1
来源期刊：Jpn J Radiol PMID：38954192

DOI：10.1007/s11604-024-01619-y
文章类型： Journal Article

目标：大型语言模型（LLM）正在迅速发展，并在理解文本信息方面表现出高性能，建议在解释患者病史和记录的影像学发现方面的潜在应用。随着LLM的不断改进，他们的诊断能力有望进一步提高。然而,不同制造商的LLM之间缺乏全面的比较。在这项研究中,我们旨在测试三个最新的主要LLM(GPT-4o，克劳德3号作品，和Gemini1.5Pro)使用放射学诊断请案例，放射学专家的每月诊断测验系列。
方法：临床病史和影像学发现，由案例提交人文本提供，是从1998年至2023年之间发表的324个来自放射学诊断请案例的测验问题中提取的。前三个鉴别诊断由GPT-4o产生，克劳德3号作品，和双子座1.5Pro，使用各自的应用程序编程接口。使用Cochrane的Q和事后McNemar测试对这三种LLM的诊断性能进行了比较分析。
结果：GPT-4o各自的诊断准确性，克劳德3号作品，和Gemini1.5Pro的主要诊断为41.0%，54.0%,和33.9%，进一步提高到49.4%，62.0%,和41.0%，当考虑前三名鉴别诊断的准确性时。在所有模型对中观察到诊断性能的显着差异。
结论：Claude3Opus在解决放射学测验病例方面优于GPT-4o和Gemini1.5Pro。当提供准确的评估和成像发现的措辞描述时，这些模型似乎能够帮助放射科医生。
OBJECTIVE: Large language models (LLMs) are rapidly advancing and demonstrating high performance in understanding textual information, suggesting potential applications in interpreting patient histories and documented imaging findings. As LLMs continue to improve, their diagnostic abilities are expected to be enhanced further. However, there is a lack of comprehensive comparisons between LLMs from different manufacturers. In this study, we aimed to test the diagnostic performance of the three latest major LLMs (GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro) using Radiology Diagnosis Please Cases, a monthly diagnostic quiz series for radiology experts.
METHODS: Clinical history and imaging findings, provided textually by the case submitters, were extracted from 324 quiz questions originating from Radiology Diagnosis Please cases published between 1998 and 2023. The top three differential diagnoses were generated by GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro, using their respective application programming interfaces. A comparative analysis of diagnostic performance among these three LLMs was conducted using Cochrane\'s Q and post hoc McNemar\'s tests.
RESULTS: The respective diagnostic accuracies of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro for primary diagnosis were 41.0%, 54.0%, and 33.9%, which further improved to 49.4%, 62.0%, and 41.0%, when considering the accuracy of any of the top three differential diagnoses. Significant differences in the diagnostic performance were observed among all pairs of models.
CONCLUSIONS: Claude 3 Opus outperformed GPT-4o and Gemini 1.5 Pro in solving radiology quiz cases. These models appear capable of assisting radiologists when supplied with accurate evaluations and worded descriptions of imaging findings.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
9 Evaluation of artificial intelligence-generated drug therapy communication skill competencies in medical education.

医学教育中人工智能生成的药物治疗沟通技能能力的评估。影响指数 : 3.716
发表时间：Jul 2024 2
来源期刊：Br J Clin Pharmacol PMID：38953544

DOI：10.1111/bcp.16144
文章类型： Journal Article

目的：本研究比较了三种人工智能（AI）平台在确定即将毕业的医生的药物治疗沟通能力方面的潜力。
方法：我们提出了三个AI平台，即,坡助手©，ChatGPT©和GoogleBard©，使用结构化查询来生成适合于毕业医生的沟通技能能力和案例场景。这些病例包括15种需要药物处方的典型医疗条件。两位作者独立评估了AI增强的临床遭遇，它整合了各种信息，以创建以患者为中心的护理计划。通过使用清单的基于共识的方法，评估了为每种情景生成的通信组件.通过参考英国国家处方集，对每种情况下提供的说明和警告进行了评估。
结果：AI平台在生成的能力领域中表现出重叠，尽管措辞有所不同。知识领域(基础和临床药理学，开处方,沟通和药物安全)得到了所有平台的一致认可。PoeAssistant©和ChatGPT©在每种情况下特定的药物治疗相关沟通问题上达成了广泛共识。共识主要包括致敬，处方仿制药，治疗目标和随访时间表。在患者的指导清晰度方面观察到差异，列出的副作用，警告和患者赋权。GoogleBard并未就患者沟通问题提供指导。
结论：AI平台认识到能力与如何陈述的差异。PoeAssistant©和ChatGPT©展示了沟通问题的一致性。然而,在特定的技能成分中观察到显著的差异，表明人为干预对人工智能生成的输出进行批判性评估的必要性。
OBJECTIVE: This study compared three artificial intelligence (AI) platforms\' potential to identify drug therapy communication competencies expected of a graduating medical doctor.
METHODS: We presented three AI platforms, namely, Poe Assistant©, ChatGPT© and Google Bard©, with structured queries to generate communication skill competencies and case scenarios appropriate for graduating medical doctors. These case scenarios comprised 15 prototypical medical conditions that required drug prescriptions. Two authors independently evaluated the AI-enhanced clinical encounters, which integrated a diverse range of information to create patient-centred care plans. Through a consensus-based approach using a checklist, the communication components generated for each scenario were assessed. The instructions and warnings provided for each case scenario were evaluated by referencing the British National Formulary.
RESULTS: AI platforms demonstrated overlap in competency domains generated, albeit with variations in wording. The domains of knowledge (basic and clinical pharmacology, prescribing, communication and drug safety) were unanimously recognized by all platforms. A broad consensus among Poe Assistant© and ChatGPT© on drug therapy-related communication issues specific to each case scenario was evident. The consensus primarily encompassed salutation, generic drug prescribed, treatment goals and follow-up schedules. Differences were observed in patient instruction clarity, listed side effects, warnings and patient empowerment. Google Bard did not provide guidance on patient communication issues.
CONCLUSIONS: AI platforms recognized competencies with variations in how these were stated. Poe Assistant© and ChatGPT© exhibited alignment of communication issues. However, significant discrepancies were observed in specific skill components, indicating the necessity of human intervention to critically evaluate AI-generated outputs.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
10 Using ChatGPT in the Development of Clinical Reasoning Cases: A Qualitative Study.

在临床推理病例开发中使用 ChatGPT ：一项定性研究。影响指数 : 暂无
发表时间：May 2024
来源期刊：Cureus PMID：38953081

DOI：10.7759/cureus.61438
文章类型： Journal Article

背景关于在医学中使用人工智能的伦理和效用的评论和讨论激增，其在医学教育中的实际应用仍在争论中。通过定性研究方法,本研究旨在强调在医学生教育临床推理案例开发中使用ChatGPT的优势和陷阱。方法为五名经验丰富的医学教育教师提供指导，以使用ChatGPT3.0为三个不同的主要问题创建独特的临床推理案例。然后要求教师反思和审查创建的案例。最后,进行了一个焦点小组，以进一步分析和描述他们对新技术的经验。总体结果,教师发现在临床推理案例的开发中使用ChatGPT易于使用，但很难达到某些目标，并且在很大程度上没有足够的创造力来为学生使用创造复杂性，而无需进行大量编辑。创建的案例确实提供了一个有用的起点，并且非常有效；然而，教师确实经历了一些医学错误和事实捏造。结论使用ChatGPT开发课程内容有价值，尤其是临床推理案例，但需要全面审查和核实。为了高效有效地利用该工具，教育工作者将需要开发一个框架，可以很容易地翻译成简单的提示，ChatGPT可以理解。未来的工作将需要强烈考虑再循环偏见和错误信息的风险。
Background There has been an explosion of commentary and discussion about the ethics and utility of using artificial intelligence in medicine, and its practical use in medical education is still being debated. Through qualitative research methods, this study aims to highlight the advantages and pitfalls of using ChatGPT in the development of clinical reasoning cases for medical student education. Methods Five highly experienced faculty in medical education were provided instructions to create unique clinical reasoning cases for three different chief concerns using ChatGPT 3.0. Faculty were then asked to reflect on and review the created cases. Finally, a focus group was conducted to further analyze and describe their experiences with the new technology. Results Overall, faculty found the use of ChatGPT in the development of clinical reasoning cases easy to use but difficult to get to certain objectives and largely incapable of being creative enough to create complexity for student use without heavy editing. The created cases did provide a helpful starting point and were extremely efficient; however, faculty did experience some medical inaccuracies and fact fabrication. Conclusion There is value to using ChatGPT to develop curricular content, especially for clinical reasoning cases, but it needs to be comprehensively reviewed and verified. To efficiently and effectively utilize the tool, educators will need to develop a framework that can be easily translatable into simple prompts that ChatGPT can understand. Future work will need to strongly consider the risks of recirculating biases and misinformation.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)

ChatGPT 关注

1 Can large language models provide accurate and quality information to parents regarding chronic kidney diseases?

2 ChatGPT: perspectives from human-computer interaction and psychology.

3 Can chatbots enhance the management of pediatric sialadenitis in clinical practice?

4 Analyzing Large Language Models' Responses to Common Lumbar Spine Fusion Surgery Questions: A Comparison Between ChatGPT and Bard.

5 Assessing ChatGPT's ability to emulate human reviewers in scientific research: A descriptive and qualitative approach.

6 Impact Analysis of COVID-19 Pandemic on Hospital Reviews on Dianping Website in Shanghai, China: Empirical Study.

7 GPT-based chatbot tools are still unreliable in the management of prosthetic joint infections.

8 Diagnostic performances of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro in "Diagnosis Please" cases.

9 Evaluation of artificial intelligence-generated drug therapy communication skill competencies in medical education.

10 Using ChatGPT in the Development of Clinical Reasoning Cases: A Qualitative Study.