Chatbot

聊天机器人
  • 文章类型: Journal Article
    随着像ChatGPT这样的大型语言模型在各个行业中的应用越来越多,它在医疗领域的潜力,特别是在标准化考试中,已成为研究的重点。
    本研究的目的是评估ChatGPT的临床表现,重点关注其在中国国家医师资格考试(CNMLE)中的准确性和可靠性。
    CNMLE2022问题集,由500个单答案多选题组成,被重新分类为15个医学亚专科。从2023年4月24日至5月15日,每个问题在OpenAI平台上用中文进行了8到12次测试。考虑了三个关键因素:GPT-3.5和4.0版本,针对医疗亚专科定制的系统角色的提示指定,为了连贯性而重复。通过准确度阈值被建立为60%。采用χ2检验和κ值评估模型的准确性和一致性。
    GPT-4.0达到了72.7%的通过精度,显著高于GPT-3.5(54%;P<.001)。GPT-4.0重复反应的变异性低于GPT-3.5(9%vs19.5%;P<.001)。然而,两个模型都显示出相对较好的响应一致性,κ值分别为0.778和0.610。系统角色在数值上提高了GPT-4.0(0.3%-3.7%)和GPT-3.5(1.3%-4.5%)的准确性,并将变异性降低了1.7%和1.8%,分别(P>0.05)。在亚组分析中,ChatGPT在不同题型之间取得了相当的准确率(P>.05)。GPT-4.0在15个亚专业中的14个超过了准确性阈值,而GPT-3.5在第一次反应的15人中有7人这样做。
    GPT-4.0通过了CNMLE,并在准确性等关键领域优于GPT-3.5,一致性,和医学专科专业知识。添加系统角色不会显着增强模型的可靠性和答案的连贯性。GPT-4.0在医学教育和临床实践中显示出有希望的潜力,值得进一步研究。
    UNASSIGNED: With the increasing application of large language models like ChatGPT in various industries, its potential in the medical domain, especially in standardized examinations, has become a focal point of research.
    UNASSIGNED: The aim of this study is to assess the clinical performance of ChatGPT, focusing on its accuracy and reliability in the Chinese National Medical Licensing Examination (CNMLE).
    UNASSIGNED: The CNMLE 2022 question set, consisting of 500 single-answer multiple choices questions, were reclassified into 15 medical subspecialties. Each question was tested 8 to 12 times in Chinese on the OpenAI platform from April 24 to May 15, 2023. Three key factors were considered: the version of GPT-3.5 and 4.0, the prompt\'s designation of system roles tailored to medical subspecialties, and repetition for coherence. A passing accuracy threshold was established as 60%. The χ2 tests and κ values were employed to evaluate the model\'s accuracy and consistency.
    UNASSIGNED: GPT-4.0 achieved a passing accuracy of 72.7%, which was significantly higher than that of GPT-3.5 (54%; P<.001). The variability rate of repeated responses from GPT-4.0 was lower than that of GPT-3.5 (9% vs 19.5%; P<.001). However, both models showed relatively good response coherence, with κ values of 0.778 and 0.610, respectively. System roles numerically increased accuracy for both GPT-4.0 (0.3%-3.7%) and GPT-3.5 (1.3%-4.5%), and reduced variability by 1.7% and 1.8%, respectively (P>.05). In subgroup analysis, ChatGPT achieved comparable accuracy among different question types (P>.05). GPT-4.0 surpassed the accuracy threshold in 14 of 15 subspecialties, while GPT-3.5 did so in 7 of 15 on the first response.
    UNASSIGNED: GPT-4.0 passed the CNMLE and outperformed GPT-3.5 in key areas such as accuracy, consistency, and medical subspecialty expertise. Adding a system role insignificantly enhanced the model\'s reliability and answer coherence. GPT-4.0 showed promising potential in medical education and clinical practice, meriting further study.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:HIV暴露前预防(PrEP)是预防顺性女性之间HIV传播的重要生物医学策略。尽管其有效性已被证明,在整个PrEP护理连续过程中,黑人女性的比例仍然严重不足,面临障碍,如获得护理的机会有限,医学上的不信任,以及交叉的种族或艾滋病毒耻辱。解决这些差异对于改善该社区的艾滋病毒预防成果至关重要。另一方面,护士从业人员(NPs)在PrEP利用中起着关键作用,但由于缺乏意识,代表性不足,缺乏人力资源,支持不足。配备人工智能(AI)和先进的大型语言模型的快速发展,聊天机器人有效地促进了医疗交流和与各个领域的医疗联系,包括艾滋病毒预防和PrEP护理。
    目的:我们的研究通过自然语言处理算法利用NPs的整体护理能力和AI的力量,提供有针对性的,以患者为中心促进PrEP护理。我们的首要目标是创建一个护士主导的,利益相关者包容性,和人工智能驱动的计划,以促进顺性黑人女性的PrEP利用,最终分三个阶段加强这一弱势群体的艾滋病毒预防工作。该项目旨在缓解健康差距,推进创新,基于技术的解决方案。
    方法:该研究使用混合方法设计,涉及与关键利益相关者的半结构化访谈,包括50名符合PrEP资格的黑人女性,10个NP,以及代表各种社会经济背景的社区顾问委员会。AI驱动的聊天机器人使用HumanX技术和SmartBot360的健康保险可移植性和责任法案兼容框架开发,以确保数据隐私和安全。这项研究历时18个月,包括3个阶段:探索,发展,和评价。
    结果:截至2024年5月,第一阶段的机构审查委员会方案已获得批准。我们计划在2024年9月开始招募黑人女性和NP,目的是收集信息以了解他们对聊天机器人开发的偏好。虽然机构审查委员会对第二阶段和第三阶段的批准仍在进行中,我们在参与者招募网络方面取得了重大进展。我们计划很快进行数据收集,随着研究的进展,将提供招聘和数据收集进展的进一步更新。
    结论:AI驱动的聊天机器人提供了一种新颖的方法来改善黑人女性的PrEP护理利用率,有机会减少护理障碍,并促进无污名化的环境。然而,卫生公平和数字鸿沟方面的挑战仍然存在,强调需要有文化能力的设计和强大的数据隐私协议。这项研究的意义超出了PrEP护理,提出了一个可扩展的模型,可以解决更广泛的健康差距。
    PRR1-10.2196/59975。
    BACKGROUND: HIV pre-exposure prophylaxis (PrEP) is a critical biomedical strategy to prevent HIV transmission among cisgender women. Despite its proven effectiveness, Black cisgender women remain significantly underrepresented throughout the PrEP care continuum, facing barriers such as limited access to care, medical mistrust, and intersectional racial or HIV stigma. Addressing these disparities is vital to improving HIV prevention outcomes within this community. On the other hand, nurse practitioners (NPs) play a pivotal role in PrEP utilization but are underrepresented due to a lack of awareness, a lack of human resources, and insufficient support. Equipped with the rapid evolution of artificial intelligence (AI) and advanced large language models, chatbots effectively facilitate health care communication and linkage to care in various domains, including HIV prevention and PrEP care.
    OBJECTIVE: Our study harnesses NPs\' holistic care capabilities and the power of AI through natural language processing algorithms, providing targeted, patient-centered facilitation for PrEP care. Our overarching goal is to create a nurse-led, stakeholder-inclusive, and AI-powered program to facilitate PrEP utilization among Black cisgender women, ultimately enhancing HIV prevention efforts in this vulnerable group in 3 phases. This project aims to mitigate health disparities and advance innovative, technology-based solutions.
    METHODS: The study uses a mixed methods design involving semistructured interviews with key stakeholders, including 50 PrEP-eligible Black women, 10 NPs, and a community advisory board representing various socioeconomic backgrounds. The AI-powered chatbot is developed using HumanX technology and SmartBot360\'s Health Insurance Portability and Accountability Act-compliant framework to ensure data privacy and security. The study spans 18 months and consists of 3 phases: exploration, development, and evaluation.
    RESULTS: As of May 2024, the institutional review board protocol for phase 1 has been approved. We plan to start recruitment for Black cisgender women and NPs in September 2024, with the aim to collect information to understand their preferences regarding chatbot development. While institutional review board approval for phases 2 and 3 is still in progress, we have made significant strides in networking for participant recruitment. We plan to conduct data collection soon, and further updates on the recruitment and data collection progress will be provided as the study advances.
    CONCLUSIONS: The AI-powered chatbot offers a novel approach to improving PrEP care utilization among Black cisgender women, with opportunities to reduce barriers to care and facilitate a stigma-free environment. However, challenges remain regarding health equity and the digital divide, emphasizing the need for culturally competent design and robust data privacy protocols. The implications of this study extend beyond PrEP care, presenting a scalable model that can address broader health disparities.
    UNASSIGNED: PRR1-10.2196/59975.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    为了评估响应能力,在公共医疗系统耳鼻喉科工作竞争考试中,ChatGPT3.5和互联网连接的GPT-4引擎(MicrosoftCopilot),以耳鼻喉科专家的真实分数为对照组。2023年9月,将135个分为理论和实践部分的问题输入到ChatGPT3.5和连接互联网的GPT-4中。将AI反应的准确性与参加考试的耳鼻喉科医生的官方结果进行了比较,采用Stata14.2进行统计分析。副驾驶(GPT-4)的表现优于ChatGPT3.5。副驾驶取得88.5分的成绩,而ChatGPT得了60分。两个AI的错误答案都存在差异。尽管ChatGPT很熟练,Copilot表现出卓越的性能,在参加考试的108名耳鼻喉科医生中排名第二,而ChatGPT排在第83位。与ChatGPT3.5相比,由具有互联网访问功能的GPT-4(Copilot)提供的聊天在回答多项选择的医疗问题方面表现出卓越的性能。
    To evaluate the response capabilities, in a public healthcare system otolaryngology job competition examination, of ChatGPT 3.5 and an internet-connected GPT-4 engine (Microsoft Copilot) with the real scores of otolaryngology specialists as the control group. In September 2023, 135 questions divided into theoretical and practical parts were input into ChatGPT 3.5 and an internet-connected GPT-4. The accuracy of AI responses was compared with the official results from otolaryngologists who took the exam, and statistical analysis was conducted using Stata 14.2. Copilot (GPT-4) outperformed ChatGPT 3.5. Copilot achieved a score of 88.5 points, while ChatGPT scored 60 points. Both AIs had discrepancies in their incorrect answers. Despite ChatGPT\'s proficiency, Copilot displayed superior performance, ranking as the second-best score among the 108 otolaryngologists who took the exam, while ChatGPT was placed 83rd. A chat powered by GPT-4 with internet access (Copilot) demonstrates superior performance in responding to multiple-choice medical questions compared to ChatGPT 3.5.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:吸烟构成重大的公共卫生风险。聊天机器人可以作为一种可访问和有用的工具来促进戒烟,因为它们具有很高的可访问性和促进长期个性化交互的潜力。为了提高有效性和可接受性,仍然需要确定和评估这些聊天机器人的咨询策略,这在以前的研究中没有得到全面解决。
    目的:本研究旨在为此类聊天机器人确定有效的咨询策略,以支持戒烟。此外,我们试图深入了解吸烟者对聊天机器人的期望和体验。
    方法:这项混合方法研究结合了基于网络的实验和半结构化访谈。吸烟者(N=229)与动机性访谈(MI)风格(n=112,48.9%)或对抗性咨询风格(n=117,51.1%)的聊天机器人进行了互动。两者都与停止有关(即,退出意向和自我效能感)和与用户体验相关的结果(即,订婚,治疗联盟,感知到的同理心,和互动满意度)进行评估。对16名参与者进行了半结构化访谈,8(50%)从每个条件,并采用专题分析法对数据进行分析。
    结果:多变量方差分析的结果表明,参与者对MI(与对抗性咨询)聊天机器人的总体评分明显更高。后续判别分析显示,MI聊天机器人的更好感知主要由用户体验相关结果解释,与戒烟相关的结果起的作用较小。探索性分析表明,两种情况下的吸烟者在聊天机器人互动后都报告了戒烟意愿和自我效能感的增加。访谈结果说明了几种结构(例如,情感态度和参与度)解释人们以前的期望以及与聊天机器人的及时和回顾性的经验。
    结论:结果证实,聊天机器人是激励戒烟的有前途的工具,使用MI可以改善用户体验。我们没有找到对MI激励戒烟的额外支持,并讨论了可能的原因。吸烟者在戒烟过程中既表达了关系需求,也表达了工具需求。讨论了对未来研究和实践的启示。
    BACKGROUND: Cigarette smoking poses a major public health risk. Chatbots may serve as an accessible and useful tool to promote cessation due to their high accessibility and potential in facilitating long-term personalized interactions. To increase effectiveness and acceptability, there remains a need to identify and evaluate counseling strategies for these chatbots, an aspect that has not been comprehensively addressed in previous research.
    OBJECTIVE: This study aims to identify effective counseling strategies for such chatbots to support smoking cessation. In addition, we sought to gain insights into smokers\' expectations of and experiences with the chatbot.
    METHODS: This mixed methods study incorporated a web-based experiment and semistructured interviews. Smokers (N=229) interacted with either a motivational interviewing (MI)-style (n=112, 48.9%) or a confrontational counseling-style (n=117, 51.1%) chatbot. Both cessation-related (ie, intention to quit and self-efficacy) and user experience-related outcomes (ie, engagement, therapeutic alliance, perceived empathy, and interaction satisfaction) were assessed. Semistructured interviews were conducted with 16 participants, 8 (50%) from each condition, and data were analyzed using thematic analysis.
    RESULTS: Results from a multivariate ANOVA showed that participants had a significantly higher overall rating for the MI (vs confrontational counseling) chatbot. Follow-up discriminant analysis revealed that the better perception of the MI chatbot was mostly explained by the user experience-related outcomes, with cessation-related outcomes playing a lesser role. Exploratory analyses indicated that smokers in both conditions reported increased intention to quit and self-efficacy after the chatbot interaction. Interview findings illustrated several constructs (eg, affective attitude and engagement) explaining people\'s previous expectations and timely and retrospective experience with the chatbot.
    CONCLUSIONS: The results confirmed that chatbots are a promising tool in motivating smoking cessation and the use of MI can improve user experience. We did not find extra support for MI to motivate cessation and have discussed possible reasons. Smokers expressed both relational and instrumental needs in the quitting process. Implications for future research and practice are discussed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:很少有进食障碍(ED)患者接受治疗。需要创新来识别患有ED的个人并解决护理障碍。我们开发了一个聊天机器人,用于促进服务的吸收,可以与在线筛选配对。然而,尚不知道哪些组件驱动效果。这项研究估计了四个聊天机器人组件对心理健康服务使用(主要)的个人和综合贡献,聊天机器人乐于助人,以及对改变饮食/形状/体重问题的态度(“改变态度,“分数越高,表明重要性/准备程度越高)。
    方法:在一项优化的随机对照试验中,随机选择了250名接受ED筛查但未接受治疗的个体,以接受多达四个聊天机器人组件:心理教育,动机性面试,个性化服务推荐,和重复给药(随访检查/提醒)。在基线和第2、6和14周进行评估。
    结果:接受重复给药的参与者更有可能报告使用精神卫生服务,其他组件对服务使用没有显著影响。重复管理减缓了参与者随着时间的推移所经历的态度变化的下降。接受激励面试的参与者发现聊天机器人更有帮助,但这一因素也与改变态度的更大下降有关。收到个性化推荐的参与者发现聊天机器人更有帮助,并且自己接收该组件与最有利的改变态度时间趋势有关。心理教育没有效果。
    结论:结果表明各组成部分对结果的重要影响;研究结果将用于最终确定有关优化干预方案的决策。聊天机器人显示出解决ED治疗差距的巨大潜力。
    OBJECTIVE: Few individuals with eating disorders (EDs) receive treatment. Innovations are needed to identify individuals with EDs and address care barriers. We developed a chatbot for promoting services uptake that could be paired with online screening. However, it is not yet known which components drive effects. This study estimated individual and combined contributions of four chatbot components on mental health services use (primary), chatbot helpfulness, and attitudes toward changing eating/shape/weight concerns (\"change attitudes,\" with higher scores indicating greater importance/readiness).
    METHODS: Two hundred five individuals screening with an ED but not in treatment were randomized in an optimization randomized controlled trial to receive up to four chatbot components: psychoeducation, motivational interviewing, personalized service recommendations, and repeated administration (follow-up check-ins/reminders). Assessments were at baseline and 2, 6, and 14 weeks.
    RESULTS: Participants who received repeated administration were more likely to report mental health services use, with no significant effects of other components on services use. Repeated administration slowed the decline in change attitudes participants experienced over time. Participants who received motivational interviewing found the chatbot more helpful, but this component was also associated with larger declines in change attitudes. Participants who received personalized recommendations found the chatbot more helpful, and receiving this component on its own was associated with the most favorable change attitude time trend. Psychoeducation showed no effects.
    CONCLUSIONS: Results indicated important effects of components on outcomes; findings will be used to finalize decision making about the optimized intervention package. The chatbot shows high potential for addressing the treatment gap for EDs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    聊天机器人可以影响大规模的行为改变,因为它们可以通过社交媒体访问。灵活,可扩展,并自动收集数据。然而,关于聊天机器人管理的行为改变干预措施的可行性和有效性的研究很少。在聊天机器人中实施既定的行为改变干预措施的有效性得不到保证,鉴于独特的人机交互动力学。我们通过信息提供和嵌入式动画对基于聊天机器人的行为改变进行了试点测试。我们评估了聊天机器人是否可以在大流行期间增加理解和采取保护性行为的意图。59名文化和语言不同的参与者接受了同情干预,指数增长干预,或者不干预。我们测量了参与者的COVID-19测试意图,并测量了他们在聊天机器人互动前后的待在家里的态度。我们发现保护行为的不确定性降低。指数增长干预增加了参与者的测试意图。这项研究提供了初步证据,表明聊天机器人可以引发行为改变,在多元化和代表性不足的群体中应用。
    Chatbots can effect large-scale behaviour change because they are accessible through social media, flexible, scalable, and gather data automatically. Yet research on the feasibility and effectiveness of chatbot-administered behaviour change interventions is sparse. The effectiveness of established behaviour change interventions when implemented in chatbots is not guaranteed, given the unique human-machine interaction dynamics. We pilot-tested chatbot-based behaviour change through information provision and embedded animations. We evaluated whether the chatbot could increase understanding and intentions to adopt protective behaviours during the pandemic. Fifty-nine culturally and linguistically diverse participants received a compassion intervention, an exponential growth intervention, or no intervention. We measured participants\' COVID-19 testing intentions and measured their staying-home attitudes before and after their chatbot interaction. We found reduced uncertainty about protective behaviours. The exponential growth intervention increased participants\' testing intentions. This study provides preliminary evidence that chatbots can spark behaviour change, with applications in diverse and underrepresented groups.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究使用技术接受模型(TAM)调查了老年人对大型语言模型(LLM)的接受情况。这项研究,通过横断面调查,探讨了感知易用性和感知有用性对老年人使用意图的影响。结果表明,主观规范,image,工作相关性,输出质量,结果可证明性,感知易用性对感知有用性有显著的正向直接影响(β=0.138、0.240、0.213、0.280、0.181、0.176,P<0.05)。感知易用性和感知有用性对使用意愿有显著的正向直接影响(β=0.335,0.307,P<0.05)。这项研究的实际含义凸显了对量身定制的聊天机器人的需求,为开发人员和政策制定者提供有价值的见解,旨在加强老年人群之间创新技术的整合。
    This study investigates the acceptance of large language models (LLMs) among older adults using the Technology Acceptance Model (TAM). The research, conducted through a cross-sectional survey, explores the influence of perceived ease of use and perceived usefulness on intension to use among older adults. The results show that the subjective norm, image, job relevance, output quality, result demonstrability, perceived ease of use have a significant positive and direct impact on perceived usefulness (β=0.138, 0.240, 0.213, 0.280, 0.181, 0.176, P<0.05). Perceived ease of use and perceived usefulness have a significant positive and direct impact on Intension to use (β=0.335, 0.307, P<0.05). The study\'s practical implications highlight the need for tailored chatbots, offering valuable insights for developers and policymakers aiming to enhance the integration of innovative technologies among older populations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    人工智能(AI)聊天机器人有可能通过提供量身定制的信息来帮助患有慢性健康状况的个人,监测症状,并提供心理健康支持。尽管它们有潜在的好处,关于公众对医疗保健聊天机器人态度的研究仍然有限。为了有效支持患有长期健康状况(如长期COVID(或COVID-19后))的个人,了解他们对使用AI聊天机器人的看法和偏好至关重要。
    这项研究有两个主要目标:(1)提供对慢性健康状况人群中AI聊天机器人接受度的见解,特别是55岁以上的成年人,(2)探索使用人工智能聊天机器人进行健康自我管理和长期COVID支持的看法。
    在2023年1月至3月之间进行了一项基于网络的调查研究,专门针对患有糖尿病和其他慢性病的个人。选择这个特定的人群是因为他们的潜在意识和自我管理病情的能力。该调查旨在以多个间隔捕获数据,考虑到ChatGPT的公开推出,这可能会在项目时间表期间影响公众意见。该调查获得了1310次点击,并获得了900个回复,导致总共888个可用数据点。
    尽管过去使用聊天机器人的经验(P<.001,95%CI.110-.302)和在线信息寻求(P<.001,95%CI.039-.084)是受访者未来采用健康聊天机器人的有力指标,他们普遍怀疑或不确定将AI聊天机器人用于医疗保健目的。不到三分之一的受访者(n=203,30.1%)表示,如果有的话,他们可能会在未来12个月内使用健康聊天机器人。大多数人不确定聊天机器人提供准确医疗建议的能力。然而,人们似乎更容易接受使用基于语音的聊天机器人来提高心理健康,健康数据收集,和分析。患有长COVID的受访者中有一半对使用情感智能聊天机器人表现出兴趣。
    AI犹豫在所有健康领域和用户组中并不统一。尽管AI犹豫不决,聊天机器人有很有希望的机会为生活方式改善和心理健康领域的慢性病提供支持,可能通过基于语音的用户界面。
    UNASSIGNED: Artificial intelligence (AI) chatbots have the potential to assist individuals with chronic health conditions by providing tailored information, monitoring symptoms, and offering mental health support. Despite their potential benefits, research on public attitudes toward health care chatbots is still limited. To effectively support individuals with long-term health conditions like long COVID (or post-COVID-19 condition), it is crucial to understand their perspectives and preferences regarding the use of AI chatbots.
    UNASSIGNED: This study has two main objectives: (1) provide insights into AI chatbot acceptance among people with chronic health conditions, particularly adults older than 55 years and (2) explore the perceptions of using AI chatbots for health self-management and long COVID support.
    UNASSIGNED: A web-based survey study was conducted between January and March 2023, specifically targeting individuals with diabetes and other chronic conditions. This particular population was chosen due to their potential awareness and ability to self-manage their condition. The survey aimed to capture data at multiple intervals, taking into consideration the public launch of ChatGPT, which could have potentially impacted public opinions during the project timeline. The survey received 1310 clicks and garnered 900 responses, resulting in a total of 888 usable data points.
    UNASSIGNED: Although past experience with chatbots (P<.001, 95% CI .110-.302) and online information seeking (P<.001, 95% CI .039-.084) are strong indicators of respondents\' future adoption of health chatbots, they are in general skeptical or unsure about the use of AI chatbots for health care purposes. Less than one-third of the respondents (n=203, 30.1%) indicated that they were likely to use a health chatbot in the next 12 months if available. Most were uncertain about a chatbot\'s capability to provide accurate medical advice. However, people seemed more receptive to using voice-based chatbots for mental well-being, health data collection, and analysis. Half of the respondents with long COVID showed interest in using emotionally intelligent chatbots.
    UNASSIGNED: AI hesitancy is not uniform across all health domains and user groups. Despite persistent AI hesitancy, there are promising opportunities for chatbots to offer support for chronic conditions in areas of lifestyle enhancement and mental well-being, potentially through voice-based user interfaces.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:健康结果受到未满足的社会需求的显著影响。尽管筛查社会需求在医疗机构中已变得很普遍,在确定需求后,与资源的联系往往很差。结构性障碍(例如,人员配备,时间,和空间)帮助解决社会需求可以通过基于技术的解决方案来克服。
    目的:本研究旨在介绍聊天机器人的设计和评估,DAPHNE(基于对话的医疗保健和需求生态系统助理平台),筛选社会需求并将患者和家庭与资源联系起来。
    方法:这项研究采用了三个阶段的研究方法:(1)最终用户调查,以了解未满足的需求和对聊天机器人的看法,(2)与跨学科利益相关者群体的迭代设计,(3)可行性和可用性评估。在研究1中,对美国低收入居民家庭(n=201)进行了基于网络的调查。在此之后,在研究2中,与一个跨学科的利益相关者小组(n=10)举行了基于网络的会议,使用主题和内容分析来告知chatbot的设计和开发。最后,在研究3中,对可行性和可用性的评估是通过基于网络的调查和焦点小组访谈的组合完成的,之后对社区卫生工作者(家庭倡导者;n=4)和社会工作者(n=9)进行基于情景的可用性测试.我们报告了家庭调查的描述性统计和卡方检验结果。采用内容分析和主题分析对定性数据进行分析。描述性报告可用性评分。
    结果:在调查参与者中,受雇和年轻的个人报告说,使用聊天机器人来解决社会需求的可能性更高,与年龄最大的年龄组相比。关于设计聊天机器人,利益相关者强调了提供商与技术合作的重要性,包容性会话设计,和用户教育。参与者发现聊天机器人的功能符合预期,并且聊天机器人易于使用(系统可用性量表得分=72/100)。然而,人们普遍担心建议资源的准确性,电子健康记录集成,并信任聊天机器人。
    结论:聊天机器人可以为家庭提供个性化的反馈,以识别和满足社会需求。我们的研究强调了以用户为中心的迭代设计和开发针对社交需求的聊天机器人的重要性。未来的研究应该检查疗效,成本效益,以及聊天机器人干预以满足社会需求的可扩展性。
    BACKGROUND: Health outcomes are significantly influenced by unmet social needs. Although screening for social needs has become common in health care settings, there is often poor linkage to resources after needs are identified. The structural barriers (eg, staffing, time, and space) to helping address social needs could be overcome by a technology-based solution.
    OBJECTIVE: This study aims to present the design and evaluation of a chatbot, DAPHNE (Dialog-Based Assistant Platform for Healthcare and Needs Ecosystem), which screens for social needs and links patients and families to resources.
    METHODS: This research used a three-stage study approach: (1) an end-user survey to understand unmet needs and perception toward chatbots, (2) iterative design with interdisciplinary stakeholder groups, and (3) a feasibility and usability assessment. In study 1, a web-based survey was conducted with low-income US resident households (n=201). Following that, in study 2, web-based sessions were held with an interdisciplinary group of stakeholders (n=10) using thematic and content analysis to inform the chatbot\'s design and development. Finally, in study 3, the assessment on feasibility and usability was completed via a mix of a web-based survey and focus group interviews following scenario-based usability testing with community health workers (family advocates; n=4) and social workers (n=9). We reported descriptive statistics and chi-square test results for the household survey. Content analysis and thematic analysis were used to analyze qualitative data. Usability score was descriptively reported.
    RESULTS: Among the survey participants, employed and younger individuals reported a higher likelihood of using a chatbot to address social needs, in contrast to the oldest age group. Regarding designing the chatbot, the stakeholders emphasized the importance of provider-technology collaboration, inclusive conversational design, and user education. The participants found that the chatbot\'s capabilities met expectations and that the chatbot was easy to use (System Usability Scale score=72/100). However, there were common concerns about the accuracy of suggested resources, electronic health record integration, and trust with a chatbot.
    CONCLUSIONS: Chatbots can provide personalized feedback for families to identify and meet social needs. Our study highlights the importance of user-centered iterative design and development of chatbots for social needs. Future research should examine the efficacy, cost-effectiveness, and scalability of chatbot interventions to address social needs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:大型语言模型显示出改善放射学工作流程的希望,但是它们在结构化放射任务(例如报告和数据系统(RADS)分类)上的表现仍未得到探索。
    目的:本研究旨在评估3个大型语言模型聊天机器人-Claude-2、GPT-3.5和GPT-4-在放射学报告中分配RADS类别并评估不同提示策略的影响。
    方法:这项横断面研究使用30个放射学报告(每个RADS标准10个)比较了3个聊天机器人,使用3级提示策略:零射,几枪,和指南PDF信息提示。这些病例的基础是2018年肝脏影像学报告和数据系统(LI-RADS),2022年肺部CT(计算机断层扫描)筛查报告和数据系统(Lung-RADS)和卵巢附件报告和数据系统(O-RADS)磁共振成像,由董事会认证的放射科医生精心准备。每份报告都进行了6次评估。两名失明的评论者评估了聊天机器人在患者级RADS分类和总体评级方面的反应。使用Fleissκ评估了跨重复的协议。
    结果:克劳德-2在总体评分中获得了最高的准确性,其中少量提示和指南PDF(提示-2),在6次运行中获得57%(17/30)的平均准确率,在k-pass投票中获得50%(15/30)的准确率。没有及时的工程,所有聊天机器人都表现不佳。结构化示例提示(prompt-1)的引入提高了所有聊天机器人整体评分的准确性。提供prompt-2进一步改进了Claude-2的性能,GPT-4未复制的增强。TheinterrunagreementwassubstantialforClaude-2(k=0.66foroverallratingandk=0.69forRADScategorization),对于GPT-4来说是公平的(两者的k=0.39),对于GPT-3.5来说是公平的(总体评分k=0.21,RADS分类k=0.39)。与Lung-RADS版本2022和O-RADS相比,2018年的所有聊天机器人均显示出更高的准确性(P<0.05);在2018年LI-RADS版本中,使用prompt-2,Claude-2实现了75%(45/60)的最高总体评分准确性。
    结论:当配备结构化提示和指导PDF时,Claude-2显示了根据既定标准(如LI-RADS版本2018)将RADS类别分配给放射学病例的潜力。然而,当前一代的聊天机器人滞后于根据最新的RADS标准对案件进行准确分类。
    BACKGROUND: Large language models show promise for improving radiology workflows, but their performance on structured radiological tasks such as Reporting and Data Systems (RADS) categorization remains unexplored.
    OBJECTIVE: This study aims to evaluate 3 large language model chatbots-Claude-2, GPT-3.5, and GPT-4-on assigning RADS categories to radiology reports and assess the impact of different prompting strategies.
    METHODS: This cross-sectional study compared 3 chatbots using 30 radiology reports (10 per RADS criteria), using a 3-level prompting strategy: zero-shot, few-shot, and guideline PDF-informed prompts. The cases were grounded in Liver Imaging Reporting & Data System (LI-RADS) version 2018, Lung CT (computed tomography) Screening Reporting & Data System (Lung-RADS) version 2022, and Ovarian-Adnexal Reporting & Data System (O-RADS) magnetic resonance imaging, meticulously prepared by board-certified radiologists. Each report underwent 6 assessments. Two blinded reviewers assessed the chatbots\' response at patient-level RADS categorization and overall ratings. The agreement across repetitions was assessed using Fleiss κ.
    RESULTS: Claude-2 achieved the highest accuracy in overall ratings with few-shot prompts and guideline PDFs (prompt-2), attaining 57% (17/30) average accuracy over 6 runs and 50% (15/30) accuracy with k-pass voting. Without prompt engineering, all chatbots performed poorly. The introduction of a structured exemplar prompt (prompt-1) increased the accuracy of overall ratings for all chatbots. Providing prompt-2 further improved Claude-2\'s performance, an enhancement not replicated by GPT-4. The interrun agreement was substantial for Claude-2 (k=0.66 for overall rating and k=0.69 for RADS categorization), fair for GPT-4 (k=0.39 for both), and fair for GPT-3.5 (k=0.21 for overall rating and k=0.39 for RADS categorization). All chatbots showed significantly higher accuracy with LI-RADS version 2018 than with Lung-RADS version 2022 and O-RADS (P<.05); with prompt-2, Claude-2 achieved the highest overall rating accuracy of 75% (45/60) in LI-RADS version 2018.
    CONCLUSIONS: When equipped with structured prompts and guideline PDFs, Claude-2 demonstrated potential in assigning RADS categories to radiology cases according to established criteria such as LI-RADS version 2018. However, the current generation of chatbots lags in accurately categorizing cases based on more recent RADS criteria.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号