BARD

巴尔得-别德尔综合征
  • 文章类型: Journal Article
    基于大型语言模型的人工智能聊天机器人最近已经成为传统在线搜索的替代品,并且也正在进入营养领域。在这项研究中,我们想调查人工智能聊天机器人ChatGPT和Bard(现在的双子座)是否可以创建满足不同饮食模式的饮食参考摄入量(DRI)的膳食计划。我们进一步假设可以通过修改使用的提示来改善营养充足性。膳食计划由3个不同膳食模式的账户生成(杂食性,素食主义者,和素食主义者)使用2个不同的提示,总共产生108个膳食计划。随后分析计划的营养素含量并与DRI进行比较。平均而言,膳食计划包含更少的能量和碳水化合物,但大部分超过蛋白质的DRI。维生素D和氟化物低于所有计划的DRI,而只有素食计划含有不足的维生素B12。ChatGPT建议在18例中的5例中使用维生素B12补充剂,而Bard从不推荐补充剂。提示或工具之间没有显着差异。尽管ChatGPT和Bard生成的膳食计划符合大多数DRI,有一些例外,特别是素食。这些工具可能对寻找一般饮食灵感的人有用,但是不应该依靠它们来制定营养充足的膳食计划,特别是对于有限制性饮食需求的人。
    Artificial intelligence chatbots based on large language models have recently emerged as an alternative to traditional online searches and are also entering the nutrition space. In this study, we wanted to investigate whether the artificial intelligence chatbots ChatGPT and Bard (now Gemini) can create meal plans that meet the dietary reference intake (DRI) for different dietary patterns. We further hypothesized that nutritional adequacy could be improved by modifying the prompts used. Meal plans were generated by 3 accounts for different dietary patterns (omnivorous, vegetarian, and vegan) using 2 distinct prompts resulting in 108 meal plans total. The nutrient content of the plans was subsequently analyzed and compared to the DRIs. On average, the meal plans contained less energy and carbohydrates but mostly exceeded the DRI for protein. Vitamin D and fluoride fell below the DRI for all plans, whereas only the vegan plans contained insufficient vitamin B12. ChatGPT suggested using vitamin B12 supplements in 5 of 18 instances, whereas Bard never recommended supplements. There were no significant differences between the prompts or the tools. Although the meal plans generated by ChatGPT and Bard met most DRIs, there were some exceptions, particularly for vegan diets. These tools maybe useful for individuals looking for general dietary inspiration, but they should not be relied on to create nutritionally adequate meal plans, especially for individuals with restrictive dietary needs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:定性方法对于传播和实施新的数字健康干预措施非常有益;但是,当在不断变化的卫生系统中需要来自数据源的及时知识时,这些方法可能是时间密集的,并且会减慢传播速度。生成人工智能(GenAI)及其基础大型语言模型(LLM)的最新进展可能为加快文本数据的定性分析提供了一个有希望的机会。但它们的有效性和可靠性仍然未知。
    目的:我们研究的主要目的是评估主题的一致性,编码的可靠性,以及GenAI之间归纳和演绎主题分析所需的时间(即,ChatGPT和Bard)和人类编码器。
    方法:本研究的定性数据包括40个简短的SMS短信提示提示,这些提示用于数字健康干预中,用于促进使用甲基苯丙胺的HIV感染者的抗逆转录病毒药物依从性。这些SMS文本消息的归纳和演绎主题分析是由2个独立的人类编码团队进行的。一位独立的人类分析师使用ChatGPT和Bard两种方法进行了分析。比较了方法之间主题的一致性(或主题相同的程度)和可靠性(或主题编码的一致性)。
    结果:GenAI(ChatGPT和Bard)产生的主题与人类分析人员在归纳主题分析后确定的主题的71%(5/7)一致。在演绎主题分析程序之后,人类与GenAI之间的主题一致性较低(ChatGPT:6/12,50%;Bard:7/12,58%)。人类编码员和GenAI之间这些一致主题的百分比一致性(或互码可靠性)范围从公平到中等(ChatGPT,感应:31/66,47%;ChatGPT,演绎:22/59,37%;巴德,感应:20/54,37%;巴德,演绎:21/58,36%)。总的来说,就主题的一致性(归纳:6/6,100%;演绎:5/6,83%)和编码的可靠性(归纳:23/62,37%;演绎:22/47,47%)而言,ChatGPT和Bard在两种类型的定性分析中的表现相似。平均而言,进行定性分析时,GenAI所需的总时间明显少于人类编码器(20,SD3.5分钟vs567,SD106.5分钟)。
    结论:人类编码员和GenAI产生的主题具有良好的一致性,这表明这些技术有望减少定性主题分析的资源密集型;然而,它们之间的编码可靠性相对较低,这表明混合方法是必要的。在识别细微差别和解释性主题方面,人类程序员似乎比GenAI更好。未来的研究应该考虑如何与人类程序员合作最好地使用这些强大的技术,以提高混合方法定性研究的效率,同时减轻它们可能带来的潜在道德风险。
    BACKGROUND: Qualitative methods are incredibly beneficial to the dissemination and implementation of new digital health interventions; however, these methods can be time intensive and slow down dissemination when timely knowledge from the data sources is needed in ever-changing health systems. Recent advancements in generative artificial intelligence (GenAI) and their underlying large language models (LLMs) may provide a promising opportunity to expedite the qualitative analysis of textual data, but their efficacy and reliability remain unknown.
    OBJECTIVE: The primary objectives of our study were to evaluate the consistency in themes, reliability of coding, and time needed for inductive and deductive thematic analyses between GenAI (ie, ChatGPT and Bard) and human coders.
    METHODS: The qualitative data for this study consisted of 40 brief SMS text message reminder prompts used in a digital health intervention for promoting antiretroviral medication adherence among people with HIV who use methamphetamine. Inductive and deductive thematic analyses of these SMS text messages were conducted by 2 independent teams of human coders. An independent human analyst conducted analyses following both approaches using ChatGPT and Bard. The consistency in themes (or the extent to which the themes were the same) and reliability (or agreement in coding of themes) between methods were compared.
    RESULTS: The themes generated by GenAI (both ChatGPT and Bard) were consistent with 71% (5/7) of the themes identified by human analysts following inductive thematic analysis. The consistency in themes was lower between humans and GenAI following a deductive thematic analysis procedure (ChatGPT: 6/12, 50%; Bard: 7/12, 58%). The percentage agreement (or intercoder reliability) for these congruent themes between human coders and GenAI ranged from fair to moderate (ChatGPT, inductive: 31/66, 47%; ChatGPT, deductive: 22/59, 37%; Bard, inductive: 20/54, 37%; Bard, deductive: 21/58, 36%). In general, ChatGPT and Bard performed similarly to each other across both types of qualitative analyses in terms of consistency of themes (inductive: 6/6, 100%; deductive: 5/6, 83%) and reliability of coding (inductive: 23/62, 37%; deductive: 22/47, 47%). On average, GenAI required significantly less overall time than human coders when conducting qualitative analysis (20, SD 3.5 min vs 567, SD 106.5 min).
    CONCLUSIONS: The promising consistency in the themes generated by human coders and GenAI suggests that these technologies hold promise in reducing the resource intensiveness of qualitative thematic analysis; however, the relatively lower reliability in coding between them suggests that hybrid approaches are necessary. Human coders appeared to be better than GenAI at identifying nuanced and interpretative themes. Future studies should consider how these powerful technologies can be best used in collaboration with human coders to improve the efficiency of qualitative research in hybrid approaches while also mitigating potential ethical risks that they may pose.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    许多患者使用人工智能(AI)聊天机器人作为健康信息的快速来源。这引发了关于AI聊天机器人在提供准确和可理解的信息方面的可靠性和有效性的重要问题。
    要评估和比较准确性,简洁,以及OpenAIChatGPT-4和GoogleBard对患者询问有关前列腺癌新177Lu-PSMA-617疗法的反应的可读性。
    两位专家列出了177Lu-PSMA-617治疗患者最常提出的12个问题。这十二个问题被提示给OpenAIChatGPT-4和GoogleBard。人工智能生成的回复使用在线调查平台(Qualtrics)进行分发,并由八名专家进行盲目评级。人工智能聊天机器人的性能在三个领域进行了评估和比较:准确性、简洁,和可读性。此外,还检查了与AI生成的答案相关的潜在安全问题。Mann-WhitneyU和卡方检验用于比较AI聊天机器人的性能。
    八位专家参与了调查,评估三个准确性领域的12个人工智能生成的响应,简洁,和可读性,每个聊天机器人对每个领域进行96次评估(12次回复x8位专家)。ChatGPT-4提供了比Bard更准确的答案(2.95±0.671vs2.73±0.732,p=0.027)。巴德的反应比ChatGPT-4具有更好的可读性(2.79±0.408vs2.94±0.243,p=0.003)。ChatGPT-4和Bard均获得了相当的简明评分(3.14±0.659vs3.11±0.679,p=0.798)。专家将AI生成的响应归类为不正确或部分正确,ChatGPT-4的比率为16.6%,Bard的比率为29.1%。与ChatGPT-4相比,巴德的答案包含更多的误导性信息(p=0.039)。
    AI聊天机器人获得了极大的关注,他们的表现在不断提高。尽管如此,对于寻求177Lu-PSMA-617治疗医疗信息的患者,这些技术仍需要进一步改进,才能被认为是可靠和可信的来源.
    UNASSIGNED: Many patients use artificial intelligence (AI) chatbots as a rapid source of health information. This raises important questions about the reliability and effectiveness of AI chatbots in delivering accurate and understandable information.
    UNASSIGNED: To evaluate and compare the accuracy, conciseness, and readability of responses from OpenAI ChatGPT-4 and Google Bard to patient inquiries concerning the novel 177Lu-PSMA-617 therapy for prostate cancer.
    UNASSIGNED: Two experts listed the 12 most commonly asked questions by patients on 177Lu-PSMA-617 therapy. These twelve questions were prompted to OpenAI ChatGPT-4 and Google Bard. AI-generated responses were distributed using an online survey platform (Qualtrics) and blindly rated by eight experts. The performances of the AI chatbots were evaluated and compared across three domains: accuracy, conciseness, and readability. Additionally, potential safety concerns associated with AI-generated answers were also examined. The Mann-Whitney U and chi-square tests were utilized to compare the performances of AI chatbots.
    UNASSIGNED: Eight experts participated in the survey, evaluating 12 AI-generated responses across the three domains of accuracy, conciseness, and readability, resulting in 96 assessments (12 responses x 8 experts) for each domain per chatbot. ChatGPT-4 provided more accurate answers than Bard (2.95 ± 0.671 vs 2.73 ± 0.732, p=0.027). Bard\'s responses had better readability than ChatGPT-4 (2.79 ± 0.408 vs 2.94 ± 0.243, p=0.003). Both ChatGPT-4 and Bard achieved comparable conciseness scores (3.14 ± 0.659 vs 3.11 ± 0.679, p=0.798). Experts categorized the AI-generated responses as incorrect or partially correct at a rate of 16.6% for ChatGPT-4 and 29.1% for Bard. Bard\'s answers contained significantly more misleading information than those of ChatGPT-4 (p = 0.039).
    UNASSIGNED: AI chatbots have gained significant attention, and their performance is continuously improving. Nonetheless, these technologies still need further improvements to be considered reliable and credible sources for patients seeking medical information on 177Lu-PSMA-617 therapy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项研究探讨了AI聊天机器人提供的医疗保健信息中的差异和机会。我们专注于子宫内膜癌辅助治疗的建议,分析四个地区的反应(印度尼西亚,尼日利亚,台湾,美国)和三个平台(巴德,宾,ChatGPT-3.5).利用以前发表的案例,我们在24小时的窗口内从每个位置向聊天机器人提出了相同的问题。回答以双盲方式评估相关性,清晰度,深度,焦点,和十位子宫内膜癌专家的连贯性。我们的分析显示,不同国家/地区存在显著差异(p<0.001)。有趣的是,Bing\在尼日利亚的回应一贯优于其他国家(p<0.05),在所有评价标准中均表现优异(p<0.001)。与其他地区相比,巴德在尼日利亚的表现也更好(p<0.05),在所有类别中始终超过它们(p<0.001,相关性达到p<0.01)。值得注意的是,在所有地区,Bard的总分均显着高于ChatGPT-3.5和Bing(p<0.001)。这些发现强调了基于用户位置和平台的人工智能医疗信息质量的差异和机会。这强调了更多研究和开发的必要性,以确保通过AI技术平等获得可信赖的医疗信息。
    This study explores disparities and opportunities in healthcare information provided by AI chatbots. We focused on recommendations for adjuvant therapy in endometrial cancer, analyzing responses across four regions (Indonesia, Nigeria, Taiwan, USA) and three platforms (Bard, Bing, ChatGPT-3.5). Utilizing previously published cases, we asked identical questions to chatbots from each location within a 24-h window. Responses were evaluated in a double-blinded manner on relevance, clarity, depth, focus, and coherence by ten experts in endometrial cancer. Our analysis revealed significant variations across different countries/regions (p < 0.001). Interestingly, Bing\'s responses in Nigeria consistently outperformed others (p < 0.05), excelling in all evaluation criteria (p < 0.001). Bard also performed better in Nigeria compared to other regions (p < 0.05), consistently surpassing them across all categories (p < 0.001, with relevance reaching p < 0.01). Notably, Bard\'s overall scores were significantly higher than those of ChatGPT-3.5 and Bing in all locations (p < 0.001). These findings highlight disparities and opportunities in the quality of AI-powered healthcare information based on user location and platform. This emphasizes the necessity for more research and development to guarantee equal access to trustworthy medical information through AI technologies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目标:在数字时代,患者转向在线来源获取腰椎融合信息,需要仔细研究大型语言模型(LLM),例如用于患者教育的聊天生成预训练变压器(ChatGPT)。
    方法:我们的研究旨在评估OpenAI(人工智能)的ChatGPT3.5和Google的Bard对腰椎融合手术患者问题的响应质量。我们通过谷歌搜索从158个常见问题中找出了10个关键问题,然后将其呈现给两个聊天机器人。五名失明的脊柱外科医生以4分制对反应进行了评分,从“不满意”到“优秀”。\'答案的清晰度和专业性也使用5点李克特量表进行了评估。
    结果:在我们对ChatGPT3.5和Bard的10个问题的评估中,97%的反应被评为优秀或令人满意。具体来说,ChatGPT有62%的优秀和32%的最低澄清反应,只有6%需要适度或实质性的澄清。巴德的回答是66%的优秀和24%的最低澄清,10%需要更多的澄清。2个模型之间的总体评级分布没有发现显着差异。两人都在努力解决关于手术风险的3个具体问题,成功率,以及手术入路的选择(Q3,Q4和Q5)。两种模型的评分者间可靠性均较低(ChatGPT:k=0.041,p=0.622;Bard:k=-0.040,p=0.601)。虽然两人在理解和同理心上都得分很高,吟游诗人在同理心和专业精神方面的评分略低。
    结论:ChatGPT3.5和Bard有效回答了腰椎融合常见问题,但需要进一步的培训和研究来巩固LLM在医学教育和医疗保健沟通中的作用。
    OBJECTIVE: In the digital age, patients turn to online sources for lumbar spine fusion information, necessitating a careful study of large language models (LLMs) like chat generative pre-trained transformer (ChatGPT) for patient education.
    METHODS: Our study aims to assess the response quality of Open AI (artificial intelligence)\'s ChatGPT 3.5 and Google\'s Bard to patient questions on lumbar spine fusion surgery. We identified 10 critical questions from 158 frequently asked ones via Google search, which were then presented to both chatbots. Five blinded spine surgeons rated the responses on a 4-point scale from \'unsatisfactory\' to \'excellent.\' The clarity and professionalism of the answers were also evaluated using a 5-point Likert scale.
    RESULTS: In our evaluation of 10 questions across ChatGPT 3.5 and Bard, 97% of responses were rated as excellent or satisfactory. Specifically, ChatGPT had 62% excellent and 32% minimally clarifying responses, with only 6% needing moderate or substantial clarification. Bard\'s responses were 66% excellent and 24% minimally clarifying, with 10% requiring more clarification. No significant difference was found in the overall rating distribution between the 2 models. Both struggled with 3 specific questions regarding surgical risks, success rates, and selection of surgical approaches (Q3, Q4, and Q5). Interrater reliability was low for both models (ChatGPT: k = 0.041, p = 0.622; Bard: k = -0.040, p = 0.601). While both scored well on understanding and empathy, Bard received marginally lower ratings in empathy and professionalism.
    CONCLUSIONS: ChatGPT3.5 and Bard effectively answered lumbar spine fusion FAQs, but further training and research are needed to solidify LLMs\' role in medical education and healthcare communication.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:由于人工智能(AI)的最新进展,语言模型应用程序可以生成逻辑文本输出,很难与人类写作区分开。ChatGPT(OpenAI)和Bard(随后更名为“Gemini”;GoogleAI)是使用不同的方法开发的,但是关于它们产生摘要的能力差异的研究很少。在脊柱外科领域使用AI撰写科学摘要是许多争论和争议的中心。
    目的:本研究的目的是评估由ChatGPT和Bard生成的结构化摘要与人类撰写的摘要在脊柱外科领域的可重复性。
    方法:总共,从7种著名期刊中随机选择60篇涉及脊柱部分的摘要,并用作ChatGPT和Bard输入语句,以根据提供的论文标题生成摘要。共174篇摘要,分为人类撰写的摘要,ChatGPT生成的摘要,和Bard生成的摘要,对期刊指南的结构化格式和内容的一致性进行了评估。使用iThenticate和ZeroGPT程序评估抄袭和AI输出的可能性,分别。脊柱领域共有8位评审员评估了30篇随机提取的摘要,以确定它们是由AI还是人类作者制作的。
    结果:ChatGPT摘要中符合期刊格式指南的摘要比例(34/60,56.6%)高于Bard产生的摘要(6/54,11.1%;P<.001)。然而,与ChatGPT摘要(30/60,50%;P<.001)相比,Bard摘要的字数符合期刊指南的比例更高(49/54,90.7%)。ChatGPT生成的摘要的相似性指数(20.7%)显著低于Bard生成的摘要(32.1%;P<.001)。AI检测程序预测,21.7%(13/60)的人类群体,ChatGPT组的63.3%(38/60),Bard组的87%(47/54)可能是由人工智能产生的,曲线下面积值为0.863(P<.001)。人类评审员的平均检出率为53.8%(SD11.2%),灵敏度为56.3%,特异性为48.4%。共有56.3%(63/112)的实际人类撰写的摘要和55.9%(62/128)的人工智能生成的摘要被认为是人类撰写的和人工智能生成的。分别。
    结论:ChatGPT和Bard都可以用来帮助编写摘要,但大多数人工智能生成的摘要目前被认为是不道德的,因为抄袭和人工智能检测率很高。ChatGPT生成的摘要在满足期刊格式指南方面似乎优于Bard生成的摘要。因为人类无法准确区分人类编写的摘要和人工智能程序产生的摘要,至关重要的是要特别谨慎,并检查使用AI程序的道德界限,包括ChatGPT和Bard.
    BACKGROUND: Due to recent advances in artificial intelligence (AI), language model applications can generate logical text output that is difficult to distinguish from human writing. ChatGPT (OpenAI) and Bard (subsequently rebranded as \"Gemini\"; Google AI) were developed using distinct approaches, but little has been studied about the difference in their capability to generate the abstract. The use of AI to write scientific abstracts in the field of spine surgery is the center of much debate and controversy.
    OBJECTIVE: The objective of this study is to assess the reproducibility of the structured abstracts generated by ChatGPT and Bard compared to human-written abstracts in the field of spine surgery.
    METHODS: In total, 60 abstracts dealing with spine sections were randomly selected from 7 reputable journals and used as ChatGPT and Bard input statements to generate abstracts based on supplied paper titles. A total of 174 abstracts, divided into human-written abstracts, ChatGPT-generated abstracts, and Bard-generated abstracts, were evaluated for compliance with the structured format of journal guidelines and consistency of content. The likelihood of plagiarism and AI output was assessed using the iThenticate and ZeroGPT programs, respectively. A total of 8 reviewers in the spinal field evaluated 30 randomly extracted abstracts to determine whether they were produced by AI or human authors.
    RESULTS: The proportion of abstracts that met journal formatting guidelines was greater among ChatGPT abstracts (34/60, 56.6%) compared with those generated by Bard (6/54, 11.1%; P<.001). However, a higher proportion of Bard abstracts (49/54, 90.7%) had word counts that met journal guidelines compared with ChatGPT abstracts (30/60, 50%; P<.001). The similarity index was significantly lower among ChatGPT-generated abstracts (20.7%) compared with Bard-generated abstracts (32.1%; P<.001). The AI-detection program predicted that 21.7% (13/60) of the human group, 63.3% (38/60) of the ChatGPT group, and 87% (47/54) of the Bard group were possibly generated by AI, with an area under the curve value of 0.863 (P<.001). The mean detection rate by human reviewers was 53.8% (SD 11.2%), achieving a sensitivity of 56.3% and a specificity of 48.4%. A total of 56.3% (63/112) of the actual human-written abstracts and 55.9% (62/128) of AI-generated abstracts were recognized as human-written and AI-generated by human reviewers, respectively.
    CONCLUSIONS: Both ChatGPT and Bard can be used to help write abstracts, but most AI-generated abstracts are currently considered unethical due to high plagiarism and AI-detection rates. ChatGPT-generated abstracts appear to be superior to Bard-generated abstracts in meeting journal formatting guidelines. Because humans are unable to accurately distinguish abstracts written by humans from those produced by AI programs, it is crucial to exercise special caution and examine the ethical boundaries of using AI programs, including ChatGPT and Bard.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:序贯器官衰竭评估(SOFA)评分在重症监护病房(ICU)中起着至关重要的作用,可以提供患者器官功能或衰竭程度的可靠指标。然而,精确的评估是耗时的,ICU临床实践中的日常评估可能具有挑战性。
    方法:创建了ICU环境中的真实场景,和ChatGPT4.0Plus的数据挖掘精度,巴德,使用Spearman's以及关于确定SOFA评分的准确性的组内相关系数评估了困惑AI。
    结果:在实际SOFA评分与通过ChatGPT4.0Plus计算的评分之间观察到最强的相关性(r相关系数0.92)(p<0.001)。相比之下,实际SOFA与Bard计算的SOFA之间的相关性中等(r=0.59,p=0.070),虽然与困惑AI的相关性很大,在0.89,p<0.001。SOFA与ChatGPT4.0Plus的类间相关系数分析,巴德,困惑AI为ICC=0.94。
    结论:人工智能(AI)工具,特别是ChatGPT4.0Plus,在ICU设置中通过AI数据挖掘协助自动进行SOFA分数计算方面表现出重大希望。它们提供了减少人工工作量并提高连续患者监测和评估效率的途径。然而,进一步的开发和验证是必要的,以确保在重症监护环境的准确性和可靠性。
    OBJECTIVE: The Sequential Organ Failure Assessment (SOFA) score plays a crucial role in intensive care units (ICUs) by providing a reliable measure of a patient\'s organ function or extent of failure. However, the precise assessment is time-consuming, and daily assessment in clinical practice in the ICU can be challenging.
    METHODS: Realistic scenarios in an ICU setting were created, and the data mining precision of ChatGPT 4.0 Plus, Bard, and Perplexity AI were assessed using Spearman\'s as well as the intraclass correlation coefficients regarding the accuracy in determining the SOFA score.
    RESULTS: The strongest correlation was observed between the actual SOFA score and the score calculated by ChatGPT 4.0 Plus (r correlation coefficient 0.92) (p<0.001). In contrast, the correlation between the actual SOFA and that calculated by Bard was moderate (r=0.59, p=0.070), while the correlation with Perplexity AI was substantial, at 0.89, with a p<0.001. The interclass correlation coefficient analysis of SOFA with those of ChatGPT 4.0 Plus, Bard, and Perplexity AI was ICC=0.94.
    CONCLUSIONS: Artificial intelligence (AI) tools, particularly ChatGPT 4.0 Plus, show significant promise in assisting with automated SOFA score calculations via AI data mining in ICU settings. They offer a pathway to reduce the manual workload and increase the efficiency of continuous patient monitoring and assessment. However, further development and validation are necessary to ensure accuracy and reliability in a critical care environment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在术后护理中,患者教育和随访是提高护理质量和满意度的关键。人工智能虚拟助理(AIVA)和大型语言模型(LLM)(如GoogleBARD和ChatGPT-4)提供了使用自然语言处理(NLP)技术解决患者查询的途径。然而,信息的准确性和适当性在这些平台上有所不同,需要进行比较研究以评估其在该领域的功效。我们进行了一项比较AIVA(使用GoogleDialogflow)与ChatGPT-4和GoogleBARD的研究,评估准确性,知识差距,和反应适当性。AIVA表现出卓越的性能,与BARD和ChatGPT-4相比,具有更高的准确性(平均值:0.9)和更低的知识差距(平均值:0.1)。此外,AIVA的回应获得了更高的Likert得分,以证明其适当性。我们的发现表明,与通用LLM相比,像AIVA这样的专业AI工具在为术后护理提供精确和上下文相关的信息方面更有效。而ChatGPT-4显示出希望,它的性能各不相同,尤其是在言语互动中。这强调了定制AI解决方案在医疗保健领域的重要性。其中准确性和清晰度是最重要的。我们的研究强调了进一步研究和开发定制AI解决方案以解决特定医疗环境并改善患者预后的必要性。
    In postoperative care, patient education and follow-up are pivotal for enhancing the quality of care and satisfaction. Artificial intelligence virtual assistants (AIVA) and large language models (LLMs) like Google BARD and ChatGPT-4 offer avenues for addressing patient queries using natural language processing (NLP) techniques. However, the accuracy and appropriateness of the information vary across these platforms, necessitating a comparative study to evaluate their efficacy in this domain. We conducted a study comparing AIVA (using Google Dialogflow) with ChatGPT-4 and Google BARD, assessing the accuracy, knowledge gap, and response appropriateness. AIVA demonstrated superior performance, with significantly higher accuracy (mean: 0.9) and lower knowledge gap (mean: 0.1) compared to BARD and ChatGPT-4. Additionally, AIVA\'s responses received higher Likert scores for appropriateness. Our findings suggest that specialized AI tools like AIVA are more effective in delivering precise and contextually relevant information for postoperative care compared to general-purpose LLMs. While ChatGPT-4 shows promise, its performance varies, particularly in verbal interactions. This underscores the importance of tailored AI solutions in healthcare, where accuracy and clarity are paramount. Our study highlights the necessity for further research and the development of customized AI solutions to address specific medical contexts and improve patient outcomes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:生成人工智能(GAI)是一种通过对话机器人(“聊天机器人”)与人们提供更大连接的技术。这些机器人可以使用与人类无法区分的自然语言进行对话,并且是患者的潜在信息来源。这项研究的目的是使用2008年至2023年西班牙MIR考试的问题,检查这些机器人在解决与骨科手术和创伤学相关的特定问题方面的表现。
    方法:三种“chatbot”模型(ChatGPT,通过回答MIR的114个问题来分析BARD和Perplexity)。他们的准确性进行了比较,评估了他们回答的可读性,并考察了它们对逻辑推理以及内部和外部信息的依赖性。在故障中还评估了错误的类型。
    结果:ChatGPT获得了72.81%的正确答案,其次是困惑(67.54%)和BARD(60.53%)。BARD提供最可读和全面的响应。回答证明了逻辑推理和使用问题提示中的内部信息。在16个问题(14%)中,所有三个应用程序同时失败。错误被发现,包括逻辑和信息故障。
    结论:虽然对话机器人在解决医疗问题方面很有用,由于错误的可能性,建议谨慎。目前,它们应该被视为一种开发工具,人类的观点应该战胜GAI。
    BACKGROUND: Generative Artificial Intelligence is a technology that provides greater connectivity with people through conversational bots («chatbots»). These bots can engage in dialogue using natural language indistinguishable from humans and are a potential source of information for patients.The aim of this study is to examine the performance of these bots in solving specific issues related to orthopedic surgery and traumatology using questions from the Spanish MIR exam between 2008 and 2023.
    METHODS: Three «chatbot» models (ChatGPT, Bard and Perplexity) were analyzed by answering 114 questions from the MIR. Their accuracy was compared, the readability of their responses was evaluated, and their dependence on logical reasoning and internal and external information was examined. The type of error was also evaluated in the failures.
    RESULTS: ChatGPT obtained 72.81% correct answers, followed by Perplexity (67.54%) and Bard (60.53%).Bard provides the most readable and comprehensive responses. The responses demonstrated logical reasoning and the use of internal information from the question prompts. In 16 questions (14%), all 3 applications failed simultaneously. Errors were identified, including logical and information failures.
    CONCLUSIONS: While conversational bots can be useful in resolving medical questions, caution is advised due to the possibility of errors. Currently, they should be considered as a developing tool, and human opinion should prevail over Generative Artificial Intelligence.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    大型语言模型(LLM)可能为缺乏训练有素的卫生人员提供解决方案,特别是在低收入和中等收入国家。然而,他们的优点和缺点仍然不清楚。
    在这里,我们将不同的LLM(Bard2023.07.13,Claude2,ChatGPT4)与耳鼻喉科(ORL)的六名顾问进行比较。
    从文献和德国州考试中提取了基于案例的问题。来自Bard2023.07.13,Claude2,ChatGPT4和六名ORL顾问的答案在6分李克特量表上对医疗充分性进行了盲目评级,可理解性,连贯性,和简洁。将给出的答案与经过验证的答案进行比较,并评估其危害。进行改进的图灵测试并比较字符计数。
    LLM答案在所有类别中都低于顾问。然而,顾问和LLM之间的差异很小,在简洁性和可理解性方面最明显的差距。在LLM中,克劳德2在医疗充分性和简洁性方面被评为最佳。咨询公司的回答与经过验证的解决方案匹配的比例为93%(228/246),ChatGPT4占85%(35/41),克劳德2占78%(32/41),和巴德2023.07.13占59%(24/41)。答案在ChatGPT4的10%(24/246),Claude2的14%(34/246),Bard2023.07.13的19%(46/264)和顾问的6%(71/1230)中被评为潜在危险。
    尽管顾问表现优异,LLM显示出在ORL中临床应用的潜力。未来的研究应该更大规模地评估他们的表现。
    UNASSIGNED: Large Language Models (LLMs) might offer a solution for the lack of trained health personnel, particularly in low- and middle-income countries. However, their strengths and weaknesses remain unclear.
    UNASSIGNED: Here we benchmark different LLMs (Bard 2023.07.13, Claude 2, ChatGPT 4) against six consultants in otorhinolaryngology (ORL).
    UNASSIGNED: Case-based questions were extracted from literature and German state examinations. Answers from Bard 2023.07.13, Claude 2, ChatGPT 4, and six ORL consultants were rated blindly on a 6-point Likert-scale for medical adequacy, comprehensibility, coherence, and conciseness. Given answers were compared to validated answers and evaluated for hazards. A modified Turing test was performed and character counts were compared.
    UNASSIGNED: LLMs answers ranked inferior to consultants in all categories. Yet, the difference between consultants and LLMs was marginal, with the clearest disparity in conciseness and the smallest in comprehensibility. Among LLMs Claude 2 was rated best in medical adequacy and conciseness. Consultants\' answers matched the validated solution in 93% (228/246), ChatGPT 4 in 85% (35/41), Claude 2 in 78% (32/41), and Bard 2023.07.13 in 59% (24/41). Answers were rated as potentially hazardous in 10% (24/246) for ChatGPT 4, 14% (34/246) for Claude 2, 19% (46/264) for Bard 2023.07.13, and 6% (71/1230) for consultants.
    UNASSIGNED: Despite consultants superior performance, LLMs show potential for clinical application in ORL. Future studies should assess their performance on larger scale.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号