BingAI

  • 文章类型: Journal Article
    背景:大型语言模型(LLM)正变得越来越重要,因为它们被更频繁地用于提供医疗信息。我们的目标是评估电子人工智能(AI)大型语言模型(LLM)的有效性,例如ChatGPT-4,BingAI,和双子座回答患者关于早产儿视网膜病变(ROP)的询问。
    方法:三位眼科医生使用5点Likert量表评估了LLM对50项现实生活中患者询问的回答。还使用DISCERN仪器和EQIP框架评估了模型响应的可靠性,以及使用Flesch阅读方便(FRE)的可读性,Flesch-Kincaid等级(FKGL),和Coleman-Liau指数。
    结果:ChatGPT-4的表现优于BingAI和双子座,在90%(50分中的45分)中得分最高,并在98%(50分中的49分)的回答中获得“同意”或“强烈同意”的评级。它的准确性和可靠性分别为DISCERN和EQIP评分为63和72.2。BingAI的得分为53和61.1,而Gemini的可读性最好(FRE得分为39.1),但可靠性得分较低。特别是在筛选中观察到统计学上显著的性能差异,诊断,和治疗类别。
    结论:ChatGPT-4在对ROP相关查询提供详细和可靠的响应方面表现出色,虽然它的文本更复杂。根据DISCERN和EQIP评估,所有模型均提供了大致准确的信息。
    BACKGROUND: Large language models (LLMs) are becoming increasingly important as they are being used more frequently for providing medical information. Our aim is to evaluate the effectiveness of electronic artificial intelligence (AI) large language models (LLMs), such as ChatGPT-4, BingAI, and Gemini in responding to patient inquiries about retinopathy of prematurity (ROP).
    METHODS: The answers of LLMs for fifty real-life patient inquiries were assessed using a 5-point Likert scale by three ophthalmologists. The models\' responses were also evaluated for reliability with the DISCERN instrument and the EQIP framework, and for readability using the Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), and Coleman-Liau Index.
    RESULTS: ChatGPT-4 outperformed BingAI and Gemini, scoring the highest with 5 points in 90% (45 out of 50) and achieving ratings of \"agreed\" or \"strongly agreed\" in 98% (49 out of 50) of responses. It led in accuracy and reliability with DISCERN and EQIP scores of 63 and 72.2, respectively. BingAI followed with scores of 53 and 61.1, while Gemini was noted for the best readability (FRE score of 39.1) but lower reliability scores. Statistically significant performance differences were observed particularly in the screening, diagnosis, and treatment categories.
    CONCLUSIONS: ChatGPT-4 excelled in providing detailed and reliable responses to ROP-related queries, although its texts were more complex. All models delivered generally accurate information as per DISCERN and EQIP assessments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    利用大型语言模型(LLM)的人工智能(AI)聊天机器人最近引起了极大的兴趣,因为它们能够以交互式对话格式对用户查询生成类似人类的响应。虽然这些模型越来越多地被患者用来获取医疗信息,科学和医疗提供者,和受训人员解决生物医学问题,他们的表现可能因领域而异。这些聊天机器人对骨骼健康和科学的广泛理解所带来的机遇和风险是未知的。在这里,我们评估了3个备受瞩目的LLM聊天机器人的性能,聊天生成预培训变压器(ChatGPT)4.0、BingAI、和Bard,解决3类30个问题:基础和转化骨骼生物学,骨骼疾病的临床医生管理,和患者查询,以评估回答的准确性和质量。在每个类别中提出了30个问题,并且由4名评审员独立对回答的准确度进行分级。虽然每个聊天机器人通常都能够提供有关骨骼疾病的相关信息,这些回应的质量和相关性差异很大,ChatGPT4.0和ChatGPT4.0在每个类别中的总体中位数得分最高.这些聊天机器人中的每一个都表现出不同的限制,包括不一致,不完整,或者不相关的回应,在专业背景下不适当地利用非专业资源,在提供建议时未能考虑患者的人口统计学或临床背景,并且无法一致地识别相关文献中的不确定性领域。需要仔细考虑当前AI聊天机器人的机会和风险,以制定最佳实践指南,将其用作骨骼健康和生物学信息的来源。
    人工智能聊天机器人越来越多地用作医疗保健和研究环境中的信息来源,因为它们具有可访问性和使用对话语言总结复杂主题的能力。然而,目前还不清楚他们是否能为骨骼的医学和生物学相关问题提供准确的信息。这里,我们测试了三个著名的聊天机器人ChatGPT的性能,巴德,和BingAI-通过根据完善的骨骼生物学概念对他们进行一系列提示,现实的医患情景,和潜在的病人问题。尽管它们在功能上相似,在三个不同的聊天机器人服务中观察到了响应准确性的差异。在某些情况下,聊天机器人表现很好,在其他情况下,观察到强烈的局限性,包括对临床背景和患者人口统计学的不一致考虑,偶尔提供不正确或过时的信息,引用不适当的来源。仔细考虑他们目前的弱点,人工智能聊天机器人提供了改变骨骼健康和科学教育的潜力。
    Artificial intelligence (AI) chatbots utilizing large language models (LLMs) have recently garnered significant interest due to their ability to generate humanlike responses to user inquiries in an interactive dialog format. While these models are being increasingly utilized to obtain medical information by patients, scientific and medical providers, and trainees to address biomedical questions, their performance may vary from field to field. The opportunities and risks these chatbots pose to the widespread understanding of skeletal health and science are unknown. Here we assess the performance of 3 high-profile LLM chatbots, Chat Generative Pre-Trained Transformer (ChatGPT) 4.0, BingAI, and Bard, to address 30 questions in 3 categories: basic and translational skeletal biology, clinical practitioner management of skeletal disorders, and patient queries to assess the accuracy and quality of the responses. Thirty questions in each of these categories were posed, and responses were independently graded for their degree of accuracy by four reviewers. While each of the chatbots was often able to provide relevant information about skeletal disorders, the quality and relevance of these responses varied widely, and ChatGPT 4.0 had the highest overall median score in each of the categories. Each of these chatbots displayed distinct limitations that included inconsistent, incomplete, or irrelevant responses, inappropriate utilization of lay sources in a professional context, a failure to take patient demographics or clinical context into account when providing recommendations, and an inability to consistently identify areas of uncertainty in the relevant literature. Careful consideration of both the opportunities and risks of current AI chatbots is needed to formulate guidelines for best practices for their use as source of information about skeletal health and biology.
    Artificial intelligence chatbots are increasingly used as a source of information in health care and research settings due to their accessibility and ability to summarize complex topics using conversational language. However, it is still unclear whether they can provide accurate information for questions related to the medicine and biology of the skeleton. Here, we tested the performance of three prominent chatbots—ChatGPT, Bard, and BingAI—by tasking them with a series of prompts based on well-established skeletal biology concepts, realistic physician–patient scenarios, and potential patient questions. Despite their similarities in function, differences in the accuracy of responses were observed across the three different chatbot services. While in some contexts, chatbots performed well, and in other cases, strong limitations were observed, including inconsistent consideration of clinical context and patient demographics, occasionally providing incorrect or out-of-date information, and citation of inappropriate sources. With careful consideration of their current weaknesses, artificial intelligence chatbots offer the potential to transform education on skeletal health and science.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号