AI chatbot

ai 聊天机器人
  • 文章类型: Journal Article
    背景:新加坡,像亚洲其他地区一样,面临心理健康促进的持续挑战,包括围绕不健康和寻求治疗的耻辱,以及缺乏训练有素的心理健康人员。COVID-19大流行,这创造了精神卫生保健需求的激增,同时加速了数字健康解决方案的采用,揭示了在该地区快速扩展创新解决方案的新机会。
    目标:2020年6月,新加坡政府启动了思维。sg,一个匿名的数字心理健康资源网站,已经发展到包括超过500个策划的当地心理健康资源,一种经过临床验证的抑郁和焦虑自我评估工具,来自Wysa的人工智能(AI)聊天机器人,旨在提供数字治疗练习,以及为工作成年人量身定制的网站版本,称为工作中的思维。该平台的目标是使新加坡居民能够负责自己的心理健康,并能够通过无障碍数字解决方案的轻松便捷为周围的人提供基本支持。
    方法:通过GoogleAnalytics和自定义应用程序编程接口捕获的点击级数据分析来衡量网站使用情况。这反过来又驱动了基于开源平台TitaniumDatabase和Metabase的定制分析基础架构。独特,nonbounce(用户不立即从网站导航),已介入,并报告返回用户。
    结果:在发布后的2年中(2020年7月1日至2022年6月30日),该网站接待了>447,000名访客(约占目标人口300万的15%),62.02%(277,727/447,783)的人探索网站或参与资源(称为非弹跳访客);10.54%(29,271/277,727)的非弹跳访客返回。该平台上最受欢迎的功能是聊天机器人提供的基于对话的治疗练习和自我评估工具,25.54%(67,626/264,758)和11.69%(32,469/277,727)的非反弹访客使用了这些数据。在工作中,广泛参与的非反弹访客的比率(即,花费≥40秒勘探资源),一年内返回的人数分别为51.56%(22,474/43,588)和13.43%(5,853/43,588),分别,与30.9%(42,829/138,626)和9.97%(13,822/138,626)相比,分别,在通用的思维线上。sg网站在同一年。
    结论:该网站已达到预期的范围,并且访客数量增长强劲,这需要大量和持续的数字营销活动和战略外展伙伴关系。该网站小心翼翼地保持匿名性,限制分析的细节。总体采用的良好水平鼓励我们相信,轻度至中度的精神健康状况以及其背后的社会因素适合数字干预。而心态。sg主要在新加坡使用,我们相信本地定制的类似解决方案是广泛和全球适用的。
    BACKGROUND: Singapore, like the rest of Asia, faces persistent challenges to mental health promotion, including stigma around unwellness and seeking treatment and a lack of trained mental health personnel. The COVID-19 pandemic, which created a surge in mental health care needs and simultaneously accelerated the adoption of digital health solutions, revealed a new opportunity to quickly scale innovative solutions in the region.
    OBJECTIVE: In June 2020, the Singaporean government launched mindline.sg, an anonymous digital mental health resource website that has grown to include >500 curated local mental health resources, a clinically validated self-assessment tool for depression and anxiety, an artificial intelligence (AI) chatbot from Wysa designed to deliver digital therapeutic exercises, and a tailored version of the website for working adults called mindline at work. The goal of the platform is to empower Singapore residents to take charge of their own mental health and to be able to offer basic support to those around them through the ease and convenience of a barrier-free digital solution.
    METHODS: Website use is measured through click-level data analytics captured via Google Analytics and custom application programming interfaces, which in turn drive a customized analytics infrastructure based on the open-source platforms Titanium Database and Metabase. Unique, nonbounced (users that do not immediately navigate away from the site), engaged, and return users are reported.
    RESULTS: In the 2 years following launch (July 1, 2020, through June 30, 2022), the website received >447,000 visitors (approximately 15% of the target population of 3 million), 62.02% (277,727/447,783) of whom explored the site or engaged with resources (referred to as nonbounced visitors); 10.54% (29,271/277,727) of those nonbounced visitors returned. The most popular features on the platform were the dialogue-based therapeutic exercises delivered by the chatbot and the self-assessment tool, which were used by 25.54% (67,626/264,758) and 11.69% (32,469/277,727) of nonbounced visitors. On mindline at work, the rates of nonbounced visitors who engaged extensively (ie, spent ≥40 seconds exploring resources) and who returned were 51.56% (22,474/43,588) and 13.43% (5,853/43,588) over a year, respectively, compared to 30.9% (42,829/138,626) and 9.97% (13,822/138,626), respectively, on the generic mindline.sg site in the same year.
    CONCLUSIONS: The site has achieved desired reach and has seen a strong growth rate in the number of visitors, which required substantial and sustained digital marketing campaigns and strategic outreach partnerships. The site was careful to preserve anonymity, limiting the detail of analytics. The good levels of overall adoption encourage us to believe that mild to moderate mental health conditions and the social factors that underly them are amenable to digital interventions. While mindline.sg was primarily used in Singapore, we believe that similar solutions with local customization are widely and globally applicable.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:脊柱关节炎(SpA),慢性炎症性疾病,主要影响骶髂关节和脊柱,显着增加残疾的风险。SpA的复杂性,其多样化的临床表现和症状往往模仿其他疾病,对其准确诊断和鉴别提出了重大挑战。由于资源有限,这种复杂性在非专业医疗保健环境中变得更加明显,导致延迟的推荐,误诊率增加,并加剧了SpA患者的残疾结果。医学诊断中大型语言模型(LLM)的出现为克服这些诊断障碍带来了革命性的潜力。尽管人工智能和LLM的最新进展证明了诊断和治疗各种疾病的有效性,它们在SpA中的应用仍然不发达。目前,有一个明显的缺乏SpA特定的LLM和一个既定的基准来评估这种模型在这个特定领域的性能。
    目的:我们的目标是建立基础医学模式,创建针对SpA的基本医学知识及其独特的诊断和治疗方案的全面评估基准。模型,培训后,将通过监督微调进一步增强。预计将大大帮助医生进行SpA诊断和治疗,特别是在获得专门护理的机会有限的环境中。此外,这一举措有望在初级保健层面促进早期和准确的SpA检测,从而减少与延迟或错误诊断相关的风险。
    方法:严格的基准,包括222个精心制定的SpA多项选择题,将被建立和发展。这些问题将被广泛修订,以确保它们在现实世界的诊断和治疗方案中准确评估LLM的表现。我们的方法涉及使用公共数据集选择和完善顶级基础模型。我们基准测试中表现最好的模型将接受进一步的培训。随后,来自医院的80,000多例实际住院和门诊病例将加强LLM培训,结合监督微调和低秩适应等技术。我们将严格评估模型生成的响应的准确性,并使用流畅性指标评估其推理过程,相关性,完整性,和医疗水平。
    结果:模型的开发正在进行中,预计到2024年初将有显著的增强。基准,以及评估结果,预计将于2024年第二季度发布。
    结论:我们的训练模型旨在利用LLM分析复杂临床数据的能力,从而实现精确检测,诊断,和治疗SpA。预计这项创新将在减少因延迟或错误的SpA诊断引起的残疾方面发挥至关重要的作用。通过在不同的医疗保健环境中推广这种模式,我们预计SpA管理会有显著改善,最终提高了患者的预后,减轻了疾病的总体负担。
    DERR1-10.2196/57001。
    BACKGROUND: Spondyloarthritis (SpA), a chronic inflammatory disorder, predominantly impacts the sacroiliac joints and spine, significantly escalating the risk of disability. SpA\'s complexity, as evidenced by its diverse clinical presentations and symptoms that often mimic other diseases, presents substantial challenges in its accurate diagnosis and differentiation. This complexity becomes even more pronounced in nonspecialist health care environments due to limited resources, resulting in delayed referrals, increased misdiagnosis rates, and exacerbated disability outcomes for patients with SpA. The emergence of large language models (LLMs) in medical diagnostics introduces a revolutionary potential to overcome these diagnostic hurdles. Despite recent advancements in artificial intelligence and LLMs demonstrating effectiveness in diagnosing and treating various diseases, their application in SpA remains underdeveloped. Currently, there is a notable absence of SpA-specific LLMs and an established benchmark for assessing the performance of such models in this particular field.
    OBJECTIVE: Our objective is to develop a foundational medical model, creating a comprehensive evaluation benchmark tailored to the essential medical knowledge of SpA and its unique diagnostic and treatment protocols. The model, post-pretraining, will be subject to further enhancement through supervised fine-tuning. It is projected to significantly aid physicians in SpA diagnosis and treatment, especially in settings with limited access to specialized care. Furthermore, this initiative is poised to promote early and accurate SpA detection at the primary care level, thereby diminishing the risks associated with delayed or incorrect diagnoses.
    METHODS: A rigorous benchmark, comprising 222 meticulously formulated multiple-choice questions on SpA, will be established and developed. These questions will be extensively revised to ensure their suitability for accurately evaluating LLMs\' performance in real-world diagnostic and therapeutic scenarios. Our methodology involves selecting and refining top foundational models using public data sets. The best-performing model in our benchmark will undergo further training. Subsequently, more than 80,000 real-world inpatient and outpatient cases from hospitals will enhance LLM training, incorporating techniques such as supervised fine-tuning and low-rank adaptation. We will rigorously assess the models\' generated responses for accuracy and evaluate their reasoning processes using the metrics of fluency, relevance, completeness, and medical proficiency.
    RESULTS: Development of the model is progressing, with significant enhancements anticipated by early 2024. The benchmark, along with the results of evaluations, is expected to be released in the second quarter of 2024.
    CONCLUSIONS: Our trained model aims to capitalize on the capabilities of LLMs in analyzing complex clinical data, thereby enabling precise detection, diagnosis, and treatment of SpA. This innovation is anticipated to play a vital role in diminishing the disabilities arising from delayed or incorrect SpA diagnoses. By promoting this model across diverse health care settings, we anticipate a significant improvement in SpA management, culminating in enhanced patient outcomes and a reduced overall burden of the disease.
    UNASSIGNED: DERR1-10.2196/57001.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    根据国家压力伤害咨询小组(NPIAP)通过临床图像解释分期评估AI聊天机器人在压力伤害分期中的准确性,进行了横截面设计,以评估五个领先的公开AI聊天机器人。因此,三个聊天机器人无法解释临床图像,而GPT-4Turbo在压力伤分期中取得了很高的准确率(83.0%),显着优于BingAICreative模式(24.0%),具有统计学意义(p<0.001)。GPT-4Turbo准确识别了阶段1(p<0.001),3(p=0.001),和4(p<0.001)压力伤害,和可疑的深部组织损伤(p<0.001),而BingAI在所有阶段都表现出显著较低的准确性。这些发现凸显了人工智能聊天机器人的潜力,尤其是GPT-4Turbo,在准确诊断图像和帮助压力伤害的后续管理。
    To evaluate the accuracy of AI chatbots in staging pressure injuries according to the National Pressure Injury Advisory Panel (NPIAP) Staging through clinical image interpretation, a cross-sectional design was conducted to assess five leading publicly available AI chatbots. As a result, three chatbots were unable to interpret the clinical images, whereas GPT-4 Turbo achieved a high accuracy rate (83.0%) in staging pressure injuries, notably outperforming BingAI Creative mode (24.0%) with statistical significance (p < 0.001). GPT-4 Turbo accurately identified Stages 1 (p < 0.001), 3 (p = 0.001), and 4 (p < 0.001) pressure injuries, and suspected deep tissue injuries (p < 0.001), while BingAI demonstrated significantly lower accuracy across all stages. The findings highlight the potential of AI chatbots, especially GPT-4 Turbo, in accurately diagnosing images and aiding the subsequent management of pressure injuries.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    这篇叙事文献综述对新兴领域进行了全面的考察,从复杂的算法水平到实际应用,追踪人工智能(AI)驱动的抑郁和焦虑检测工具的发展。提供基本的精神保健服务现在是公共卫生的重要优先事项。近年来,人工智能已经成为早期识别和干预这些普遍心理健康障碍的游戏规则改变者。人工智能工具可以通过帮助精神科医生收集患者进展和任务的客观数据,从而潜在地增强行为医疗服务。这项研究强调了当前对人工智能的理解,不同类型的AI,它目前用于多种精神疾病,优势,缺点,未来的潜力。随着技术的发展和现代数字化的提高,人工智能在精神病学中的应用将会增加;因此,需要全面了解。我们搜索了PubMed,谷歌学者,和科学直接使用关键字。在最近对使用具有AI和机器学习技术的电子健康记录(EHR)诊断所有临床状况的研究的回顾中,已经找到了大约99个出版物。在这些中,在所有年龄组中确定了35项针对心理健康障碍的研究,其中,六项研究利用了EHR数据源。通过批判性地分析著名的学术著作,我们的目标是阐明这项技术的现状,探索它的成功,局限性,和未来的方向。在这样做的时候,我们希望有助于对AI的潜力进行细致的理解,以彻底改变心理健康诊断,并为这一至关重要的领域的进一步研究和开发铺平道路。
    This narrative literature review undertakes a comprehensive examination of the burgeoning field, tracing the development of artificial intelligence (AI)-powered tools for depression and anxiety detection from the level of intricate algorithms to practical applications. Delivering essential mental health care services is now a significant public health priority. In recent years, AI has become a game-changer in the early identification and intervention of these pervasive mental health disorders. AI tools can potentially empower behavioral healthcare services by helping psychiatrists collect objective data on patients\' progress and tasks. This study emphasizes the current understanding of AI, the different types of AI, its current use in multiple mental health disorders, advantages, disadvantages, and future potentials. As technology develops and the digitalization of the modern era increases, there will be a rise in the application of artificial intelligence in psychiatry; therefore, a comprehensive understanding will be needed. We searched PubMed, Google Scholar, and Science Direct using keywords for this. In a recent review of studies using electronic health records (EHR) with AI and machine learning techniques for diagnosing all clinical conditions, roughly 99 publications have been found. Out of these, 35 studies were identified for mental health disorders in all age groups, and among them, six studies utilized EHR data sources. By critically analyzing prominent scholarly works, we aim to illuminate the current state of this technology, exploring its successes, limitations, and future directions. In doing so, we hope to contribute to a nuanced understanding of AI\'s potential to revolutionize mental health diagnostics and pave the way for further research and development in this critically important domain.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:本研究旨在评估人工智能(AI)聊天机器人(ChatGPT-3.5,OpenAI)对全身麻醉下接受手术的成年患者术前焦虑减轻和患者满意度的影响。
    方法:该研究使用单盲,随机对照试验设计。
    方法:在本研究中,将100例成年患者纳入研究,分为两组:对照组50例,患者接受麻醉护士的标准术前信息,干预组50人,患者与ChatGPT互动。主要结果,术前焦虑减轻,使用日本国家特质焦虑量表(STAI)自我报告问卷进行测量。次要终点包括参与者满意度(Q1),对治疗过程的理解(Q2),以及对AI聊天机器人的反应的感知比护士的反应更相关(Q3)。
    结果:在完成研究的85名参与者中,对照组的STAI评分保持稳定,而干预组的下降。混合效应模型显示了时间和群体时间相互作用对STAI得分的显着影响;然而,未观察到主要组效应。次要终点显示混合结果;一些患者发现聊天机器人的反应更相关,而其他人不满意或经历了困难。
    结论:与对照组相比,ChatGPT干预显著降低了术前焦虑;没有观察到STAI评分的总体差异.混合次要端点结果强调需要改进聊天机器人算法和知识库,以提高性能和满意度。人工智能聊天机器人应该补充,而不是取代,人类卫生保健提供者。AI聊天机器人之间的无缝集成和有效沟通,病人,和医疗保健提供者对于优化患者结果至关重要。
    OBJECTIVE: This study aimed to evaluate the effects of an artificial intelligence (AI) chatbot (ChatGPT-3.5, OpenAI) on preoperative anxiety reduction and patient satisfaction in adult patients undergoing surgery under general anesthesia.
    METHODS: The study used a single-blind, randomized controlled trial design.
    METHODS: In this study, 100 adult patients were enrolled and divided into two groups: 50 in the control group, in which patients received standard preoperative information from anesthesia nurses, and 50 in the intervention group, in which patients interacted with ChatGPT. The primary outcome, preoperative anxiety reduction, was measured using the Japanese State-Trait Anxiety Inventory (STAI) self-report questionnaire. The secondary endpoints included participant satisfaction (Q1), comprehension of the treatment process (Q2), and the perception of the AI chatbot\'s responses as more relevant than those of the nurses (Q3).
    RESULTS: Of the 85 participants who completed the study, the STAI scores in the control group remained stable, whereas those in the intervention group decreased. The mixed-effects model showed significant effects of time and group-time interaction on the STAI scores; however, no main group effect was observed. The secondary endpoints revealed mixed results; some patients found that the chatbot\'s responses were more relevant, whereas others were dissatisfied or experienced difficulties.
    CONCLUSIONS: The ChatGPT intervention significantly reduced preoperative anxiety compared with the control group; however, no overall difference in the STAI scores was observed. The mixed secondary endpoint results highlight the need for refining chatbot algorithms and knowledge bases to improve performance and satisfaction. AI chatbots should complement, rather than replace, human health care providers. Seamless integration and effective communication among AI chatbots, patients, and health care providers are essential for optimizing patient outcomes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    大型语言模型(“LLM”)的开发人员和供应商-例如ChatGPT,谷歌吟游诗人,和Microsoft的Bing处于最前沿-当他们代表HIPAA涵盖的实体处理受保护的健康信息(“PHI”)时,可能会受到1996年《健康保险流通和责任法案》(“HIPAA”)的约束。在这样做的时候,他们成为HIPAA下的商业伙伴或商业伙伴的分包商。
    Developers and vendors of large language models (\"LLMs\") - such as ChatGPT, Google Bard, and Microsoft\'s Bing at the forefront-can be subject to Health Insurance Portability and Accountability Act of 1996 (\"HIPAA\") when they process protected health information (\"PHI\") on behalf of the HIPAA covered entities. In doing so, they become business associates or subcontractors of a business associate under HIPAA.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究旨在调查中国研究生对AIChatbot技术的接受和利用情况及其对高等教育的启示。采用UTAUT(接受和使用技术的统一理论)模型和ECM(期望-确认模型)的融合,这项研究旨在查明影响学生态度的关键因素,满意,和关于人工智能聊天机器人的行为意图。该研究构建了一个包含七个重要预测因素的模型,旨在通过AI聊天机器人准确预测用户的意图和行为。从中国各大学注册的373名学生中收集,采用结构方程模型的偏最小二乘法对自报数据进行分析,确认模型的可靠性和有效性。研究结果验证了11个提出的假设中的7个,强调ECM结构的影响作用,特别是“确认”和“满意”,“超过了UTAUT构造对用户行为的影响。具体来说,用户的感知确认显着影响他们的满意度和随后继续使用AI聊天机器人的意图。此外,“个人创新”是塑造用户行为意图的关键决定因素。这项研究强调了在教育环境中进一步探索人工智能工具采用的必要性,并鼓励继续调查它们在教学和学习环境中的潜力。
    This study is centered on investigating the acceptance and utilization of AI Chatbot technology among graduate students in China and its implications for higher education. Employing a fusion of the UTAUT (Unified Theory of Acceptance and Use of Technology) model and the ECM (Expectation-Confirmation Model), the research seeks to pinpoint the pivotal factors influencing students\' attitudes, satisfaction, and behavioral intentions regarding AI Chatbots. The study constructs a model comprising seven substantial predictors aimed at precisely foreseeing users\' intentions and behavior with AI Chatbots. Collected from 373 students enrolled in various universities across China, the self-reported data is subject to analysis using the partial-least squares method of structural equation modeling to confirm the model\'s reliability and validity. The findings validate seven out of the eleven proposed hypotheses, underscoring the influential role of ECM constructs, particularly \"Confirmation\" and \"Satisfaction,\" outweighing the impact of UTAUT constructs on users\' behavior. Specifically, users\' perceived confirmation significantly influences their satisfaction and subsequent intention to continue using AI Chatbots. Additionally, \"Personal innovativeness\" emerges as a critical determinant shaping users\' behavioral intention. This research emphasizes the need for further exploration of AI tool adoption in educational settings and encourages continued investigation of their potential in teaching and learning environments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:人工智能(AI)是一个快速发展的领域,具有改变医疗保健和公共卫生各个方面的潜力,包括医学培训。在为五年级医学生开设的“卫生与公共卫生”课程期间,使用AI聊天机器人作为教育支持工具,进行了有关疫苗接种的实用培训课程。在接受疫苗接种的特定培训之前,对学生进行了基于网络的测试,该测试是从意大利国家医学住院医师测试中提取的。完成测试后,AI聊天机器人协助对每个问题进行了严格的更正。
    目的:这项研究的主要目的是确定AI聊天机器人是否可以被视为公共卫生培训的教育支持工具。次要目标是评估不同AI聊天机器人在意大利语复杂的多项选择医学问题上的表现。
    方法:从意大利国家医疗居住权测试中提取了由15个关于疫苗接种的多项选择题组成的测试,使用有针对性的关键词,并通过GoogleForms和不同的AI聊天机器人模型(BingChat,ChatGPT,Chatsonic,谷歌吟游诗人,和YouChat)。考试的纠正是在教室里进行的,专注于对聊天机器人提供的解释进行批判性评估。进行了Mann-WhitneyU测试,以比较医学生和AI聊天机器人的表现。在培训体验结束时匿名收集学生反馈。
    结果:总计,36名医学生和5个AI聊天机器人模型完成了测试。学生在15分中的平均得分为8.22(SD2.65),而AI聊天机器人的平均得分为12.22(SD2.77)。结果表明,两组之间的性能差异具有统计学意义(U=49.5,P<.001),具有较大的效应大小(r=0.69)。当按问题类型划分时(直接,基于场景的,和否定),在直接(P<.001)和基于情景(P<.001)的问题上观察到显著差异,但不是在否定的问题(P=.48)。学生报告对教育经历的满意度很高(7.9/10),表达强烈的重复体验的愿望(7.6/10)。
    结论:这项研究证明了AI聊天机器人在回答与疫苗接种相关的复杂医学问题和提供有价值的教育支持方面的有效性。在直接和基于情景的问题上,他们的表现大大超过了医学生。负责任和批判性地使用人工智能聊天机器人可以增强医学教育,使其成为融入教育系统的一个重要方面。
    BACKGROUND: Artificial intelligence (AI) is a rapidly developing field with the potential to transform various aspects of health care and public health, including medical training. During the \"Hygiene and Public Health\" course for fifth-year medical students, a practical training session was conducted on vaccination using AI chatbots as an educational supportive tool. Before receiving specific training on vaccination, the students were given a web-based test extracted from the Italian National Medical Residency Test. After completing the test, a critical correction of each question was performed assisted by AI chatbots.
    OBJECTIVE: The main aim of this study was to identify whether AI chatbots can be considered educational support tools for training in public health. The secondary objective was to assess the performance of different AI chatbots on complex multiple-choice medical questions in the Italian language.
    METHODS: A test composed of 15 multiple-choice questions on vaccination was extracted from the Italian National Medical Residency Test using targeted keywords and administered to medical students via Google Forms and to different AI chatbot models (Bing Chat, ChatGPT, Chatsonic, Google Bard, and YouChat). The correction of the test was conducted in the classroom, focusing on the critical evaluation of the explanations provided by the chatbot. A Mann-Whitney U test was conducted to compare the performances of medical students and AI chatbots. Student feedback was collected anonymously at the end of the training experience.
    RESULTS: In total, 36 medical students and 5 AI chatbot models completed the test. The students achieved an average score of 8.22 (SD 2.65) out of 15, while the AI chatbots scored an average of 12.22 (SD 2.77). The results indicated a statistically significant difference in performance between the 2 groups (U=49.5, P<.001), with a large effect size (r=0.69). When divided by question type (direct, scenario-based, and negative), significant differences were observed in direct (P<.001) and scenario-based (P<.001) questions, but not in negative questions (P=.48). The students reported a high level of satisfaction (7.9/10) with the educational experience, expressing a strong desire to repeat the experience (7.6/10).
    CONCLUSIONS: This study demonstrated the efficacy of AI chatbots in answering complex medical questions related to vaccination and providing valuable educational support. Their performance significantly surpassed that of medical students in direct and scenario-based questions. The responsible and critical use of AI chatbots can enhance medical education, making it an essential aspect to integrate into the educational system.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:评估三个AI聊天机器人提出的眼科医生建议的准确性和偏见,即ChatGPT3.5(OpenAI,旧金山,CA,美国),必应聊天(MicrosoftCorp.,雷德蒙德,WA,美国),和谷歌吟游诗人(AlphabetInc.,山景,CA,美国)。这项研究分析了20个美国人口最多的城市的聊天机器人建议。
    方法:每个聊天机器人在给出提示时返回了80条建议:“给我找到(城市)的四位优秀眼科医生。“医生的特点,包括专业,location,性别,实践类型,和团契,被收集。进行了一个比例z检验,以比较每个聊天机器人推荐的女性眼科医生的比例与全国平均水平(美国医学院协会(AAMC)的27.2%)。进行了Pearson的卡方检验,以确定三种聊天机器人在男性和女性推荐和推荐准确性方面的差异。
    结果:BingChat推荐的女性眼科医生(1.61%)和Bard推荐的女性眼科医生(8.0%)明显少于全国执业女性眼科医生27.2%的比例(分别为p<0.001,p<0.01)。ChatGPT推荐的女性(29.5%)少于男性眼科医生(p<0.722)。ChatGPT(73.8%),Bing聊天(67.5%),巴德(62.5%)给出了很高的不准确建议。与全国学术眼科医生的平均水平(17%)相比,在所有3个聊天机器人中,在学术医学或学术和私人综合实践中推荐的眼科医生的比例均显著较高.
    结论:这项研究揭示了AI聊天机器人建议中的实质性偏见和不准确性。他们努力可靠而准确地推荐眼科医生,大多数建议是眼科以外的专业医生,或者不在所需城市或附近。BingChat和GoogleBard显示出明显的反对推荐女性眼科医生的倾向,所有聊天机器人都赞成推荐学术医学眼科医生。
    OBJECTIVE: To evaluate the accuracy and bias of ophthalmologist recommendations made by three AI chatbots, namely ChatGPT 3.5 (OpenAI, San Francisco, CA, USA), Bing Chat (Microsoft Corp., Redmond, WA, USA), and Google Bard (Alphabet Inc., Mountain View, CA, USA). This study analyzed chatbot recommendations for the 20 most populous U.S. cities.
    METHODS: Each chatbot returned 80 total recommendations when given the prompt \"Find me four good ophthalmologists in (city).\" Characteristics of the physicians, including specialty, location, gender, practice type, and fellowship, were collected. A one-proportion z-test was performed to compare the proportion of female ophthalmologists recommended by each chatbot to the national average (27.2% per the Association of American Medical Colleges (AAMC)). Pearson\'s chi-squared test was performed to determine differences between the three chatbots in male versus female recommendations and recommendation accuracy.
    RESULTS: Female ophthalmologists recommended by Bing Chat (1.61%) and Bard (8.0%) were significantly less than the national proportion of 27.2% practicing female ophthalmologists (p<0.001, p<0.01, respectively). ChatGPT recommended fewer female (29.5%) than male ophthalmologists (p<0.722). ChatGPT (73.8%), Bing Chat (67.5%), and Bard (62.5%) gave high rates of inaccurate recommendations. Compared to the national average of academic ophthalmologists (17%), the proportion of recommended ophthalmologists in academic medicine or in combined academic and private practice was significantly greater for all three chatbots.
    CONCLUSIONS: This study revealed substantial bias and inaccuracy in the AI chatbots\' recommendations. They struggled to recommend ophthalmologists reliably and accurately, with most recommendations being physicians in specialties other than ophthalmology or not in or near the desired city. Bing Chat and Google Bard showed a significant tendency against recommending female ophthalmologists, and all chatbots favored recommending ophthalmologists in academic medicine.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:人工智能聊天机器人产生的鉴别诊断的诊断准确性,包括ChatGPT模型,对于来自普通内科(GIM)部门的复杂临床小插曲,病例报告未知。
    目的:本研究旨在通过使用来自Dokkyo医科大学附属医院GIM部门发表的病例报告的病例小插图,评估第三代ChatGPT(ChatGPT-3.5)和第四代ChatGPT(ChatGPT-4)产生的鉴别诊断列表的准确性。日本。
    方法:我们搜索了PubMed的病例报告。识别后,医生选择诊断病例,确定最终诊断,并将它们展示在临床小插曲中。医师在ChatGPT-3.5和ChatGPT-4提示中用临床插图键入确定的文本以生成前10个鉴别诊断。ChatGPT模型没有经过专门训练或进一步加强。来自其他医疗机构的三位GIM医生通过阅读相同的临床插图来创建差异诊断列表。我们测量了前10个鉴别诊断列表中的正确诊断率,前5个鉴别诊断列表,和顶级诊断。
    结果:总计,对52例病例报告进行分析。ChatGPT-4在前10个鉴别诊断列表中的正确诊断率,前5个鉴别诊断列表,最高诊断为83%(43/52),81%(42/52),和60%(31/52),分别。ChatGPT-3.5在前10个鉴别诊断列表中的正确诊断率,前5个鉴别诊断列表,最高诊断为73%(38/52),65%(34/52),和42%(22/52),分别。ChatGPT-4的正确诊断率与前10名医生的正确诊断率相当(43/52,83%vs39/52,75%,分别为;P=0.47)和前五名(42/52,81%vs35/52,67%,分别;P=.18)鉴别诊断列表和最高诊断(31/52,60%vs26/52,50%,分别;P=.43),尽管差异不显著。ChatGPT模型的诊断准确性根据开放访问状态或发布日期(2011年之前与2022年之前)没有显着差异。
    结论:本研究证明了使用ChatGPT-3.5和ChatGPT-4生成的鉴别诊断列表对来自GIM部门发布的病例报告的复杂临床观察的潜在诊断准确性。ChatGPT-4产生的前10名和前5名鉴别诊断列表中的正确诊断率超过80%。尽管来自单个部门的有限病例报告数据集,我们的发现强调了ChatGPT-4作为医生补充工具的潜在效用,特别是那些隶属于GIM部门的人。进一步的调查应通过使用培训数据之外的不同案例材料来探索ChatGPT的诊断准确性。这些努力将全面了解人工智能在增强临床决策中的作用。
    BACKGROUND: The diagnostic accuracy of differential diagnoses generated by artificial intelligence chatbots, including ChatGPT models, for complex clinical vignettes derived from general internal medicine (GIM) department case reports is unknown.
    OBJECTIVE: This study aims to evaluate the accuracy of the differential diagnosis lists generated by both third-generation ChatGPT (ChatGPT-3.5) and fourth-generation ChatGPT (ChatGPT-4) by using case vignettes from case reports published by the Department of GIM of Dokkyo Medical University Hospital, Japan.
    METHODS: We searched PubMed for case reports. Upon identification, physicians selected diagnostic cases, determined the final diagnosis, and displayed them into clinical vignettes. Physicians typed the determined text with the clinical vignettes in the ChatGPT-3.5 and ChatGPT-4 prompts to generate the top 10 differential diagnoses. The ChatGPT models were not specially trained or further reinforced for this task. Three GIM physicians from other medical institutions created differential diagnosis lists by reading the same clinical vignettes. We measured the rate of correct diagnosis within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and the top diagnosis.
    RESULTS: In total, 52 case reports were analyzed. The rates of correct diagnosis by ChatGPT-4 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 83% (43/52), 81% (42/52), and 60% (31/52), respectively. The rates of correct diagnosis by ChatGPT-3.5 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 73% (38/52), 65% (34/52), and 42% (22/52), respectively. The rates of correct diagnosis by ChatGPT-4 were comparable to those by physicians within the top 10 (43/52, 83% vs 39/52, 75%, respectively; P=.47) and within the top 5 (42/52, 81% vs 35/52, 67%, respectively; P=.18) differential diagnosis lists and top diagnosis (31/52, 60% vs 26/52, 50%, respectively; P=.43) although the difference was not significant. The ChatGPT models\' diagnostic accuracy did not significantly vary based on open access status or the publication date (before 2011 vs 2022).
    CONCLUSIONS: This study demonstrates the potential diagnostic accuracy of differential diagnosis lists generated using ChatGPT-3.5 and ChatGPT-4 for complex clinical vignettes from case reports published by the GIM department. The rate of correct diagnoses within the top 10 and top 5 differential diagnosis lists generated by ChatGPT-4 exceeds 80%. Although derived from a limited data set of case reports from a single department, our findings highlight the potential utility of ChatGPT-4 as a supplementary tool for physicians, particularly for those affiliated with the GIM department. Further investigations should explore the diagnostic accuracy of ChatGPT by using distinct case materials beyond its training data. Such efforts will provide a comprehensive insight into the role of artificial intelligence in enhancing clinical decision-making.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号