ChatGPT - 4 和 Bard 聊天机器人在回答前列腺癌 177Lu - PSMA - 617 治疗常见患者问题时的表现。Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer 177Lu-PSMA-617 therapy.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

UNASSIGNED: Many patients use artificial intelligence (AI) chatbots as a rapid source of health information. This raises important questions about the reliability and effectiveness of AI chatbots in delivering accurate and understandable information.
UNASSIGNED: To evaluate and compare the accuracy, conciseness, and readability of responses from OpenAI ChatGPT-4 and Google Bard to patient inquiries concerning the novel 177Lu-PSMA-617 therapy for prostate cancer.
UNASSIGNED: Two experts listed the 12 most commonly asked questions by patients on 177Lu-PSMA-617 therapy. These twelve questions were prompted to OpenAI ChatGPT-4 and Google Bard. AI-generated responses were distributed using an online survey platform (Qualtrics) and blindly rated by eight experts. The performances of the AI chatbots were evaluated and compared across three domains: accuracy, conciseness, and readability. Additionally, potential safety concerns associated with AI-generated answers were also examined. The Mann-Whitney U and chi-square tests were utilized to compare the performances of AI chatbots.
UNASSIGNED: Eight experts participated in the survey, evaluating 12 AI-generated responses across the three domains of accuracy, conciseness, and readability, resulting in 96 assessments (12 responses x 8 experts) for each domain per chatbot. ChatGPT-4 provided more accurate answers than Bard (2.95 ± 0.671 vs 2.73 ± 0.732, p=0.027). Bard\'s responses had better readability than ChatGPT-4 (2.79 ± 0.408 vs 2.94 ± 0.243, p=0.003). Both ChatGPT-4 and Bard achieved comparable conciseness scores (3.14 ± 0.659 vs 3.11 ± 0.679, p=0.798). Experts categorized the AI-generated responses as incorrect or partially correct at a rate of 16.6% for ChatGPT-4 and 29.1% for Bard. Bard\'s answers contained significantly more misleading information than those of ChatGPT-4 (p = 0.039).
UNASSIGNED: AI chatbots have gained significant attention, and their performance is continuously improving. Nonetheless, these technologies still need further improvements to be considered reliable and credible sources for patients seeking medical information on 177Lu-PSMA-617 therapy.

摘要：

■许多患者使用人工智能（AI）聊天机器人作为健康信息的快速来源。这引发了关于AI聊天机器人在提供准确和可理解的信息方面的可靠性和有效性的重要问题。
■要评估和比较准确性，简洁,以及OpenAIChatGPT-4和GoogleBard对患者询问有关前列腺癌新177Lu-PSMA-617疗法的反应的可读性。
两位专家列出了177Lu-PSMA-617治疗患者最常提出的12个问题。这十二个问题被提示给OpenAIChatGPT-4和GoogleBard。人工智能生成的回复使用在线调查平台(Qualtrics)进行分发，并由八名专家进行盲目评级。人工智能聊天机器人的性能在三个领域进行了评估和比较：准确性、简洁,和可读性。此外,还检查了与AI生成的答案相关的潜在安全问题。Mann-WhitneyU和卡方检验用于比较AI聊天机器人的性能。
■八位专家参与了调查,评估三个准确性领域的12个人工智能生成的响应，简洁,和可读性，每个聊天机器人对每个领域进行96次评估(12次回复x8位专家)。ChatGPT-4提供了比Bard更准确的答案（2.95±0.671vs2.73±0.732，p=0.027）。巴德的反应比ChatGPT-4具有更好的可读性（2.79±0.408vs2.94±0.243，p=0.003）。ChatGPT-4和Bard均获得了相当的简明评分（3.14±0.659vs3.11±0.679，p=0.798）。专家将AI生成的响应归类为不正确或部分正确，ChatGPT-4的比率为16.6％，Bard的比率为29.1％。与ChatGPT-4相比，巴德的答案包含更多的误导性信息（p=0.039）。
■AI聊天机器人获得了极大的关注，他们的表现在不断提高。尽管如此，对于寻求177Lu-PSMA-617治疗医疗信息的患者,这些技术仍需要进一步改进,才能被认为是可靠和可信的来源.