Conversational agents

对话代理
  • 文章类型: Journal Article
    虽然用于心理健康的聊天机器人变得越来越普遍,对用户体验和期望的研究相对较少,对其可接受性和实用性也模棱两可。本文询问人们如何对在这个空间中可能合适的东西进行理解。我们从一群经历过支持需求的非用户那里获得数据,因此可以将自我想象为治疗目标-使我们能够利用他们对与聊天机器人有关的自我的想象力猜测,以及他们认为正在发挥作用的代理形式;不受特定实际聊天机器人的约束。分析指出了一些关键问题的歧义:应用程序是否被视为在特定的心理健康事件中或与正在进行的支持福祉项目有关;聊天机器人是否可以被视为具有治疗机构或仅仅是一个工具;以及这些问题与用户的个人素质或精神健康状况的特定性质有关。一系列的传统,规范和实践被用来构建不同的期望,即聊天机器人是否可以大规模提供具有成本效益的心理健康支持解决方案。
    Whilst chatbots for mental health are becoming increasingly prevalent, research on user experiences and expectations is relatively scarce and also equivocal on their acceptability and utility. This paper asks how people formulate their understandings of what might be appropriate in this space. We draw on data from a group of non-users who have experienced a need for support, and so can imagine self as therapeutic target-enabling us to tap into their imaginative speculations of the self in relation to the chatbot other and the forms of agency they see as being at play; unconstrained by a specific actual chatbot. Analysis points towards ambiguity over some key issues: whether the apps were seen as having a role in specific episodes of mental health or in relation to an ongoing project of supporting wellbeing; whether the chatbot could be viewed as having a therapeutic agency or was a mere tool; and how far these issues related to matters of the user\'s personal qualities or the specific nature of the mental health condition. A range of traditions, norms and practices were used to construct diverse expectations on whether chatbots could offer a solution to cost-effective mental health support at scale.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:虽然病史是诊断疾病的基础,由于资源限制,教学和提供技能反馈可能具有挑战性。因此,虚拟模拟患者和基于网络的聊天机器人已经成为教育工具,随着人工智能(AI)的最新进展,如大型语言模型(LLM),增强了它们的真实性和提供反馈的潜力。
    目的:在我们的研究中,我们旨在评估生成预训练变压器(GPT)4模型的有效性,以对医学生在模拟患者的历史表现提供结构化反馈.
    方法:我们进行了一项前瞻性研究,涉及医学生使用GPT驱动的聊天机器人进行历史学习。为此,我们设计了一个聊天机器人来模拟病人的反应,并提供对学生的全面性的即时反馈。分析了学生与聊天机器人的互动,并将聊天机器人的反馈与人类评估者的反馈进行了比较。我们测量了评估者间的可靠性,并进行了描述性分析以评估反馈的质量。
    结果:研究的大多数参与者都在医学院三年级。我们的分析中总共包括了来自106个对话的1894个问答对。在超过99%的病例中,GPT-4的角色扮演和反应在医学上是合理的。GPT-4与人类评估者之间的评估者间可靠性显示出“几乎完美”的一致性(Cohenκ=0.832)。在45个反馈类别中的8个中,检测到的一致性较低(κ<0.6)突出了模型评估过于具体或与人类判断不同的主题。
    结论:GPT模型在医学生提供的关于历史记录对话的结构化反馈方面是有效的。尽管我们揭示了某些反馈类别的反馈特异性的一些限制,与人类评估者的总体高度一致表明,LLM可以成为医学教育的宝贵工具。我们的发现,因此,倡导在医疗培训中仔细整合人工智能驱动的反馈机制,并在这种情况下使用LLM时突出重要方面。
    BACKGROUND: Although history taking is fundamental for diagnosing medical conditions, teaching and providing feedback on the skill can be challenging due to resource constraints. Virtual simulated patients and web-based chatbots have thus emerged as educational tools, with recent advancements in artificial intelligence (AI) such as large language models (LLMs) enhancing their realism and potential to provide feedback.
    OBJECTIVE: In our study, we aimed to evaluate the effectiveness of a Generative Pretrained Transformer (GPT) 4 model to provide structured feedback on medical students\' performance in history taking with a simulated patient.
    METHODS: We conducted a prospective study involving medical students performing history taking with a GPT-powered chatbot. To that end, we designed a chatbot to simulate patients\' responses and provide immediate feedback on the comprehensiveness of the students\' history taking. Students\' interactions with the chatbot were analyzed, and feedback from the chatbot was compared with feedback from a human rater. We measured interrater reliability and performed a descriptive analysis to assess the quality of feedback.
    RESULTS: Most of the study\'s participants were in their third year of medical school. A total of 1894 question-answer pairs from 106 conversations were included in our analysis. GPT-4\'s role-play and responses were medically plausible in more than 99% of cases. Interrater reliability between GPT-4 and the human rater showed \"almost perfect\" agreement (Cohen κ=0.832). Less agreement (κ<0.6) detected for 8 out of 45 feedback categories highlighted topics about which the model\'s assessments were overly specific or diverged from human judgement.
    CONCLUSIONS: The GPT model was effective in providing structured feedback on history-taking dialogs provided by medical students. Although we unraveled some limitations regarding the specificity of feedback for certain feedback categories, the overall high agreement with human raters suggests that LLMs can be a valuable tool for medical education. Our findings, thus, advocate the careful integration of AI-driven feedback mechanisms in medical training and highlight important aspects when LLMs are used in that context.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    随着像ChatGPT这样的大型语言模型在各个行业中的应用越来越多,它在医疗领域的潜力,特别是在标准化考试中,已成为研究的重点。
    本研究的目的是评估ChatGPT的临床表现,重点关注其在中国国家医师资格考试(CNMLE)中的准确性和可靠性。
    CNMLE2022问题集,由500个单答案多选题组成,被重新分类为15个医学亚专科。从2023年4月24日至5月15日,每个问题在OpenAI平台上用中文进行了8到12次测试。考虑了三个关键因素:GPT-3.5和4.0版本,针对医疗亚专科定制的系统角色的提示指定,为了连贯性而重复。通过准确度阈值被建立为60%。采用χ2检验和κ值评估模型的准确性和一致性。
    GPT-4.0达到了72.7%的通过精度,显著高于GPT-3.5(54%;P<.001)。GPT-4.0重复反应的变异性低于GPT-3.5(9%vs19.5%;P<.001)。然而,两个模型都显示出相对较好的响应一致性,κ值分别为0.778和0.610。系统角色在数值上提高了GPT-4.0(0.3%-3.7%)和GPT-3.5(1.3%-4.5%)的准确性,并将变异性降低了1.7%和1.8%,分别(P>0.05)。在亚组分析中,ChatGPT在不同题型之间取得了相当的准确率(P>.05)。GPT-4.0在15个亚专业中的14个超过了准确性阈值,而GPT-3.5在第一次反应的15人中有7人这样做。
    GPT-4.0通过了CNMLE,并在准确性等关键领域优于GPT-3.5,一致性,和医学专科专业知识。添加系统角色不会显着增强模型的可靠性和答案的连贯性。GPT-4.0在医学教育和临床实践中显示出有希望的潜力,值得进一步研究。
    UNASSIGNED: With the increasing application of large language models like ChatGPT in various industries, its potential in the medical domain, especially in standardized examinations, has become a focal point of research.
    UNASSIGNED: The aim of this study is to assess the clinical performance of ChatGPT, focusing on its accuracy and reliability in the Chinese National Medical Licensing Examination (CNMLE).
    UNASSIGNED: The CNMLE 2022 question set, consisting of 500 single-answer multiple choices questions, were reclassified into 15 medical subspecialties. Each question was tested 8 to 12 times in Chinese on the OpenAI platform from April 24 to May 15, 2023. Three key factors were considered: the version of GPT-3.5 and 4.0, the prompt\'s designation of system roles tailored to medical subspecialties, and repetition for coherence. A passing accuracy threshold was established as 60%. The χ2 tests and κ values were employed to evaluate the model\'s accuracy and consistency.
    UNASSIGNED: GPT-4.0 achieved a passing accuracy of 72.7%, which was significantly higher than that of GPT-3.5 (54%; P<.001). The variability rate of repeated responses from GPT-4.0 was lower than that of GPT-3.5 (9% vs 19.5%; P<.001). However, both models showed relatively good response coherence, with κ values of 0.778 and 0.610, respectively. System roles numerically increased accuracy for both GPT-4.0 (0.3%-3.7%) and GPT-3.5 (1.3%-4.5%), and reduced variability by 1.7% and 1.8%, respectively (P>.05). In subgroup analysis, ChatGPT achieved comparable accuracy among different question types (P>.05). GPT-4.0 surpassed the accuracy threshold in 14 of 15 subspecialties, while GPT-3.5 did so in 7 of 15 on the first response.
    UNASSIGNED: GPT-4.0 passed the CNMLE and outperformed GPT-3.5 in key areas such as accuracy, consistency, and medical subspecialty expertise. Adding a system role insignificantly enhanced the model\'s reliability and answer coherence. GPT-4.0 showed promising potential in medical education and clinical practice, meriting further study.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    诸如ChatGPT之类的大型语言模型(LLM)的出现对诸如认知行为疗法(CBT)之类的心理治疗具有潜在的影响。我们系统地调查了LLM是否可以识别一个无益的想法,检查其有效性,并将其重新构建为更有用的。LLM目前有可能为识别和重组无用的想法提供合理的建议,但不应依靠领导CBT交付。
    The advent of large language models (LLMs) such as ChatGPT has potential implications for psychological therapies such as cognitive behavioral therapy (CBT). We systematically investigated whether LLMs could recognize an unhelpful thought, examine its validity, and reframe it to a more helpful one. LLMs currently have the potential to offer reasonable suggestions for the identification and reframing of unhelpful thoughts but should not be relied on to lead CBT delivery.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:全球非传染性疾病(NCDs)的患病率上升以及与之相关的近期高死亡率(74.4%),特别是在低收入和中等收入国家,造成了巨大的全球疾病负担,需要创新和可持续的长期护理解决方案。
    目标:本范围审查旨在调查基于人工智能(AI)的会话代理(CA)的影响-包括聊天机器人,语音机器人,和拟人化的数字化身-作为非传染性疾病远程管理中的人类健康护理人员,并确定未来研究的关键领域,并提供有关如何在医疗保健中有效使用这些技术以个性化非传染性疾病管理策略的见解。
    方法:于2023年7月在6个电子数据库中进行了广泛的文献检索-OvidMEDLINE,Embase,PsycINFO,PubMed,CINAHL,和WebofScience-使用搜索词“对话代理,\"\"人工智能,“和”非传染性疾病,\"包括其相关的同义词。我们还使用ProQuestCentral等来源手动搜索灰色文献,ResearchGate,ACM数字图书馆,谷歌学者。我们纳入了2010年1月至2023年7月以英文发表的实证研究,仅关注用于非传染性疾病远程管理的CA的面向医疗保健的应用。采用叙事综合法对纳入研究中提取的相关信息进行整理和归纳。
    结果:文献检索共得到43项符合纳入标准的研究。我们的评论揭示了四个重要发现:(1)用户对拟人化和基于化身的CA的远程护理的接受度和合规性更高;(2)个性化开发方面的现有差距,善解人意,和上下文感知CA,以与用户进行有效的情感和社交互动,对数据隐私和患者安全等伦理问题的考虑有限;(3)尽管卫生保健专业人员对CAs在远程卫生保健中的潜力抱有中等至高度的乐观态度,但CAs在非传染性疾病自我管理中的有效性的证据不足;(4)CAs主要用于支持非药物干预措施,如行为或生活方式的改变以及患者对非传染性疾病自我管理的教育。
    结论:这篇综述对该领域做出了独特的贡献,不仅提供了可量化的影响分析,而且还确定了需要立即学术关注的领域,善解人意,人工智能在非传染性疾病护理中的有效实施。这是未来AI辅助医疗保健NCD管理研究的学术基石。
    背景:开放科学框架;https://doi.org/10.17605/OSF。IO/GU5PX。
    BACKGROUND: The rising prevalence of noncommunicable diseases (NCDs) worldwide and the high recent mortality rates (74.4%) associated with them, especially in low- and middle-income countries, is causing a substantial global burden of disease, necessitating innovative and sustainable long-term care solutions.
    OBJECTIVE: This scoping review aims to investigate the impact of artificial intelligence (AI)-based conversational agents (CAs)-including chatbots, voicebots, and anthropomorphic digital avatars-as human-like health caregivers in the remote management of NCDs as well as identify critical areas for future research and provide insights into how these technologies might be used effectively in health care to personalize NCD management strategies.
    METHODS: A broad literature search was conducted in July 2023 in 6 electronic databases-Ovid MEDLINE, Embase, PsycINFO, PubMed, CINAHL, and Web of Science-using the search terms \"conversational agents,\" \"artificial intelligence,\" and \"noncommunicable diseases,\" including their associated synonyms. We also manually searched gray literature using sources such as ProQuest Central, ResearchGate, ACM Digital Library, and Google Scholar. We included empirical studies published in English from January 2010 to July 2023 focusing solely on health care-oriented applications of CAs used for remote management of NCDs. The narrative synthesis approach was used to collate and summarize the relevant information extracted from the included studies.
    RESULTS: The literature search yielded a total of 43 studies that matched the inclusion criteria. Our review unveiled four significant findings: (1) higher user acceptance and compliance with anthropomorphic and avatar-based CAs for remote care; (2) an existing gap in the development of personalized, empathetic, and contextually aware CAs for effective emotional and social interaction with users, along with limited consideration of ethical concerns such as data privacy and patient safety; (3) inadequate evidence of the efficacy of CAs in NCD self-management despite a moderate to high level of optimism among health care professionals regarding CAs\' potential in remote health care; and (4) CAs primarily being used for supporting nonpharmacological interventions such as behavioral or lifestyle modifications and patient education for the self-management of NCDs.
    CONCLUSIONS: This review makes a unique contribution to the field by not only providing a quantifiable impact analysis but also identifying the areas requiring imminent scholarly attention for the ethical, empathetic, and efficacious implementation of AI in NCD care. This serves as an academic cornerstone for future research in AI-assisted health care for NCD management.
    BACKGROUND: Open Science Framework; https://doi.org/10.17605/OSF.IO/GU5PX.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    准确的医疗建议对于确保最佳的患者护理至关重要,和错误信息可能导致错误的决定,潜在的有害健康结果。诸如OpenAI的GPT-4之类的大型语言模型(LLM)的出现激发了人们对其潜在医疗保健应用的兴趣,特别是在自动医疗咨询中。然而,严格的调查将他们的表现与人类专家进行比较仍然很少。
    这项研究旨在将GPT-4的医疗准确性与使用真实世界用户生成的查询提供医疗建议的人类专家进行比较,特别关注心脏病学。它还试图分析GPT-4和人类专家在特定问题类别中的表现,包括药物或药物信息和初步诊断。
    我们通过互联网门户收集了来自一般用户的251对心脏病学特定问题和来自人类专家的回答。GPT-4的任务是生成对相同问题的响应。三名独立心脏病专家(SL,JHK,和JJC)评估了人类专家和GPT-4提供的答案。使用计算机接口,每个评估者比较了这些对,并确定哪个答案更优越,他们定量地测量了问题的清晰度和复杂性以及回答的准确性和适当性,应用三级分级量表(低,中等,和高)。此外,我们进行了语言分析,使用字数和类型-标记比比较回答的长度和词汇多样性.
    GPT-4和人类专家在医疗准确性方面表现出可比的功效(132/251的“GPT-4更好”为52.6%,而119/251的“人类专家更好”为47.4%)。在准确度等级分类中,人类的高准确度应答高于GPT-4(50/237,21.1%vs30/238,12.6%),但低准确度应答的比例也更高(11/237,4.6%vs1/238,0.4%;P=.001).与人类专家相比,GPT-4的反应通常更长,并且使用的词汇量较少,可能增强一般用户的可理解性(句子计数:平均10.9,SD4.2与平均5.9,SD3.7;P<.001;类型令牌比:平均0.69,SD0.07与平均0.79,SD0.09;P<.001)。然而,人类专家在特定问题类别中的表现优于GPT-4,特别是那些与药物或药物信息和初步诊断有关的信息。这些发现强调了GPT-4在根据临床经验提供建议方面的局限性。
    GPT-4在自动医疗咨询中显示出了有希望的潜力,具有与人类专家相当的医疗准确性。然而,挑战仍然存在,尤其是在微妙的临床判断领域。LLM的未来改进可能需要整合特定的临床推理途径和监管机构,以确保安全使用。需要进一步的研究来了解LLM在各种医疗专业和条件下的全部潜力。
    UNASSIGNED: Accurate medical advice is paramount in ensuring optimal patient care, and misinformation can lead to misguided decisions with potentially detrimental health outcomes. The emergence of large language models (LLMs) such as OpenAI\'s GPT-4 has spurred interest in their potential health care applications, particularly in automated medical consultation. Yet, rigorous investigations comparing their performance to human experts remain sparse.
    UNASSIGNED: This study aims to compare the medical accuracy of GPT-4 with human experts in providing medical advice using real-world user-generated queries, with a specific focus on cardiology. It also sought to analyze the performance of GPT-4 and human experts in specific question categories, including drug or medication information and preliminary diagnoses.
    UNASSIGNED: We collected 251 pairs of cardiology-specific questions from general users and answers from human experts via an internet portal. GPT-4 was tasked with generating responses to the same questions. Three independent cardiologists (SL, JHK, and JJC) evaluated the answers provided by both human experts and GPT-4. Using a computer interface, each evaluator compared the pairs and determined which answer was superior, and they quantitatively measured the clarity and complexity of the questions as well as the accuracy and appropriateness of the responses, applying a 3-tiered grading scale (low, medium, and high). Furthermore, a linguistic analysis was conducted to compare the length and vocabulary diversity of the responses using word count and type-token ratio.
    UNASSIGNED: GPT-4 and human experts displayed comparable efficacy in medical accuracy (\"GPT-4 is better\" at 132/251, 52.6% vs \"Human expert is better\" at 119/251, 47.4%). In accuracy level categorization, humans had more high-accuracy responses than GPT-4 (50/237, 21.1% vs 30/238, 12.6%) but also a greater proportion of low-accuracy responses (11/237, 4.6% vs 1/238, 0.4%; P=.001). GPT-4 responses were generally longer and used a less diverse vocabulary than those of human experts, potentially enhancing their comprehensibility for general users (sentence count: mean 10.9, SD 4.2 vs mean 5.9, SD 3.7; P<.001; type-token ratio: mean 0.69, SD 0.07 vs mean 0.79, SD 0.09; P<.001). Nevertheless, human experts outperformed GPT-4 in specific question categories, notably those related to drug or medication information and preliminary diagnoses. These findings highlight the limitations of GPT-4 in providing advice based on clinical experience.
    UNASSIGNED: GPT-4 has shown promising potential in automated medical consultation, with comparable medical accuracy to human experts. However, challenges remain particularly in the realm of nuanced clinical judgment. Future improvements in LLMs may require the integration of specific clinical reasoning pathways and regulatory oversight for safe use. Further research is needed to understand the full potential of LLMs across various medical specialties and conditions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基于互联网的认知行为疗法(iCBT)提供了一种可扩展的、成本效益高,可访问,和低门槛形式的心理治疗。最近的进展探索了使用聊天机器人和语音助手等对话代理来增强iCBT的交付。这些代理可以提供基于iCBT的练习,识别和跟踪情绪状态,评估治疗进展,传达同理心,并可能预测长期治疗结果。然而,现有的系统主要利用分类方法进行情绪建模,这可能会过度简化人类情绪状态的复杂性。为了解决这个问题,我们开发了一个基于变压器的模型,用于基于维度文本的情感识别,用小说微调,综合维度情感数据集,包括75,503个样本。该模型在检测效价维度方面明显优于现有的最新模型,唤醒,和支配地位,皮尔逊相关系数分别为r=0.90、r=0.77和r=0.64。此外,一项涉及20名参与者的可行性研究证实了该模型的技术有效性和可用性,接受,在基于对话代理的iCBT环境中,标志着个性化和有效治疗体验的实质性改善。
    Internet-based cognitive behavioral therapy (iCBT) offers a scalable, cost-effective, accessible, and low-threshold form of psychotherapy. Recent advancements explored the use of conversational agents such as chatbots and voice assistants to enhance the delivery of iCBT. These agents can deliver iCBT-based exercises, recognize and track emotional states, assess therapy progress, convey empathy, and potentially predict long-term therapy outcome. However, existing systems predominantly utilize categorical approaches for emotional modeling, which can oversimplify the complexity of human emotional states. To address this, we developed a transformer-based model for dimensional text-based emotion recognition, fine-tuned with a novel, comprehensive dimensional emotion dataset comprising 75,503 samples. This model significantly outperforms existing state-of-the-art models in detecting the dimensions of valence, arousal, and dominance, achieving a Pearson correlation coefficient of r = 0.90, r = 0.77, and r = 0.64, respectively. Furthermore, a feasibility study involving 20 participants confirmed the model\'s technical effectiveness and its usability, acceptance, and empathic understanding in a conversational agent-based iCBT setting, marking a substantial improvement in personalized and effective therapy experiences.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:本范围审查旨在审查其特征,应用程序,评估方法,以及在老年人中使用聊天机器人的挑战。
    方法:范围审查遵循Arksey和O\'Malley的方法框架,与Levac等人提出的修订。使用系统评价的首选报告项目和范围审查的荟萃分析扩展检查表报告研究结果。
    方法:所审查的文章主要关注老年人,在临床和非临床环境中进行的研究。
    方法:通过8个数据库检索了2010年1月至2023年5月发表的研究。本综述共确定并评估了29项研究。
    结果:结果表明,聊天机器人主要通过移动应用程序交付(n=11),他们中的大多数使用文本作为输入(n=16)和输出模式(n=13),其中大多数旨在改善老年人的整体福祉(n=9);大多数聊天机器人是为满足复杂的医疗保健需求(n=7)和健康信息收集(n=6)而设计的。本综述中捕获的聊天机器人的评估方法分为技术性能,用户可接受性,和有效性;将聊天机器人应用于老年人的挑战在于聊天机器人的设计,用户感知,和操作困难。
    结论:聊天机器人在老年人领域的使用仍在兴起,缺乏专门为老用户设计的选项。关于聊天机器人作为替代干预措施对健康影响的数据仍然有限。需要更标准化的评估标准和可靠的对照实验,以进一步研究聊天机器人在老年人中的有效性。
    OBJECTIVE: This scoping review aimed to review the characteristics, applications, evaluation approaches, and challenges regarding the use of chatbots in older adults.
    METHODS: The scoping review followed the methodological framework by Arksey and O\'Malley, with revisions proposed by Levac et al. The findings were reported using the Preferred Reporting Items for Systematic Review and Meta-Analysis Extension for Scoping Reviews checklist.
    METHODS: The reviewed articles primarily focused on older adults, with research conducted in both clinical and nonclinical settings.
    METHODS: Studies published from January 2010 to May 2023 were searched through 8 databases. A total of 29 studies were identified and evaluated in this review.
    RESULTS: Results showed that the chatbots were mainly delivered via mobile applications (n = 11), most of them used text as input (n = 16) and output modality (n = 13), and most of them targeted at improving the overall well-being of the older adults (n = 9); most chatbots were designed for fulfilling complex health care needs (n = 7) and health information collection (n = 6). Evaluation approaches of chatbots captured in this review were divided into technical performance, user acceptability, and effectiveness; challenges of applying chatbots to older adults lie in the design of the chatbot, user perception, and operational difficulties.
    CONCLUSIONS: The use of chatbots in the field of older adults is still emerging, with a lack of specifically designed options for older users. Data about the health impact of chatbots as alternative interventions were still limited. More standardized evaluation criteria and robust controlled experiments are needed for further research regarding the effectiveness of chatbots in older adults.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:对话聊天机器人是一种新兴的数字戒烟干预措施。没有研究报告停止聊天机器人的整个开发过程。
    目的:描述一种名为“QuitBot”的新颖而全面的戒烟对话聊天机器人的以用户为中心的设计开发过程。\"
    方法:开发QuitBot的四年形成性研究遵循了一个11步的过程:(1)指定一个概念模型,(2)对现有干预措施进行内容分析(63小时的干预记录),(3)评估用户需求,(4)培养聊天的个性(“个性”),(5)原型制作内容和角色,(6)开发全部功能,(7)编程的QuitBot,(8)进行日记学习,(9)进行试点随机试验,(10)审查试验结果,(11)添加自由形式问答(QnA)函数,基于用户反馈的试点试验结果。添加QnA函数本身的过程涉及一个三步过程:(a)生成QnA对,(B)对QnA对的大型语言模型(LLM)进行微调,和(C)评估LLM模型输出。
    结果:一项戒烟计划,为期42天,进行2至3分钟的对话,涵盖从动机到戒烟的主题,设置退出日期,选择FDA批准的戒烟药物,应对触发器,并从失误/复发中恢复。在一项试点随机试验中,三个月的结果数据保留率为96%,与美国国家癌症研究所的SmokereeTXT(SFT)短信程序相比,QuitBot表现出较高的用户参与度和有希望的戒烟率-特别是在那些观看了所有42天节目内容的人中:30天完整案例,在三个月的随访中,QuitBot的点患病率禁欲(PPA)率为63%(39/62)。SFT为38%(45/117)(OR=2.58;95%CI:1.34,4.99;P=0.005)。然而,FacebookMessenger(FM)间歇性地阻止参与者访问QuitBot,因此我们从FM过渡到独立的智能手机应用程序作为通信渠道。参与者对QuitBot无法回答他们的开放式问题感到沮丧,这使我们开发了一个核心对话功能,使用户能够提出有关戒烟的开放式问题,并让QuitBot以准确和专业的答案做出回应。要支持此功能,我们开发了一个由11,000个QnA对组成的图书馆,内容涉及与戒烟相关的主题。模型测试结果表明,微软基于Azure的QnA制造商有效地处理了与我们的11,000个QnA对库相匹配的问题。一个微调,上下文化的GPT3.5回答了我们的QnA对库中不存在的问题。
    结论:开发过程产生了第一个基于LLM的戒烟计划,作为对话聊天机器人交付。迭代测试带来了显著的增强,包括对交付渠道的改进。一个关键的补充是包含了LLM支持的核心会话功能,允许用户提出开放式问题。
    背景:ClinicalTrials.gov标识符,NCT03585231。
    BACKGROUND: Conversational chatbots are an emerging digital intervention for smoking cessation. No studies have reported on the entire development process of a cessation chatbot.
    OBJECTIVE: We aim to report results of the user-centered design development process and randomized controlled trial for a novel and comprehensive quit smoking conversational chatbot called QuitBot.
    METHODS: The 4 years of formative research for developing QuitBot followed an 11-step process: (1) specifying a conceptual model; (2) conducting content analysis of existing interventions (63 hours of intervention transcripts); (3) assessing user needs; (4) developing the chat\'s persona (\"personality\"); (5) prototyping content and persona; (6) developing full functionality; (7) programming the QuitBot; (8) conducting a diary study; (9) conducting a pilot randomized controlled trial (RCT); (10) reviewing results of the RCT; and (11) adding a free-form question and answer (QnA) function, based on user feedback from pilot RCT results. The process of adding a QnA function itself involved a three-step process: (1) generating QnA pairs, (2) fine-tuning large language models (LLMs) on QnA pairs, and (3) evaluating the LLM outputs.
    RESULTS: We developed a quit smoking program spanning 42 days of 2- to 3-minute conversations covering topics ranging from motivations to quit, setting a quit date, choosing Food and Drug Administration-approved cessation medications, coping with triggers, and recovering from lapses and relapses. In a pilot RCT with 96% three-month outcome data retention, QuitBot demonstrated high user engagement and promising cessation rates compared to the National Cancer Institute\'s SmokefreeTXT text messaging program, particularly among those who viewed all 42 days of program content: 30-day, complete-case, point prevalence abstinence rates at 3-month follow-up were 63% (39/62) for QuitBot versus 38.5% (45/117) for SmokefreeTXT (odds ratio 2.58, 95% CI 1.34-4.99; P=.005). However, Facebook Messenger intermittently blocked participants\' access to QuitBot, so we transitioned from Facebook Messenger to a stand-alone smartphone app as the communication channel. Participants\' frustration with QuitBot\'s inability to answer their open-ended questions led to us develop a core conversational feature, enabling users to ask open-ended questions about quitting cigarette smoking and for the QuitBot to respond with accurate and professional answers. To support this functionality, we developed a library of 11,000 QnA pairs on topics associated with quitting cigarette smoking. Model testing results showed that Microsoft\'s Azure-based QnA maker effectively handled questions that matched our library of 11,000 QnA pairs. A fine-tuned, contextualized GPT-3.5 (OpenAI) responds to questions that are not within our library of QnA pairs.
    CONCLUSIONS: The development process yielded the first LLM-based quit smoking program delivered as a conversational chatbot. Iterative testing led to significant enhancements, including improvements to the delivery channel. A pivotal addition was the inclusion of a core LLM-supported conversational feature allowing users to ask open-ended questions.
    BACKGROUND: ClinicalTrials.gov NCT03585231; https://clinicaltrials.gov/study/NCT03585231.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    人乳头瘤病毒(HPV)疫苗接种低于预期。为了保护头颈部癌症的发病,需要创新的策略来提高费率。人工智能可能会提供一些解决方案,专门用对话代理来执行咨询方法。我们介绍了我们在开发对话模型以自动化动机访谈(MI)以鼓励HPV疫苗接种方面的努力。我们使用现有的基于本体的框架为MI开发了形式化的对话模型,以使用OWL2显示可计算的表示。新的话语分类与编码对话模型的本体一起被识别。我们的工作可以在GPLv.3下在GitHub上获得。我们讨论了基于本体的MI模型如何帮助标准化/形式化HPV疫苗摄取的MI咨询。我们未来的步骤将涉及评估本体模型的MI保真度,可操作性,并在与现场参与者的模拟中测试对话模型。
    Human papillomavirus (HPV) vaccinations are lower than expected. To protect the onset of head and neck cancers, innovative strategies to improve the rates are needed. Artificial intelligence may offer some solutions, specifically conversational agents to perform counseling methods. We present our efforts in developing a dialogue model for automating motivational interviewing (MI) to encourage HPV vaccination. We developed a formalized dialogue model for MI using an existing ontology-based framework to manifest a computable representation using OWL2. New utterance classifications were identified along with the ontology that encodes the dialogue model. Our work is available on GitHub under the GPL v.3. We discuss how an ontology-based model of MI can help standardize/formalize MI counseling for HPV vaccine uptake. Our future steps will involve assessing MI fidelity of the ontology model, operationalization, and testing the dialogue model in a simulation with live participants.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号