AI chatbot

ai 聊天机器人
  • 文章类型: Journal Article
    大规模人工智能(AI)语言模型聊天机器人,聊天生成预培训变压器(ChatGPT),以其快速有效地提供数据的能力而闻名。本研究旨在评估ChatGPT对麻醉程序的医学反应。
    两位麻醉师作者选择了30个问题,代表患者可能对手术和麻醉的询问。这些问题被输入到英语的ChatGPT的两个版本中。然后,总共31名麻醉师评估了每个反应的质量,数量,和总体评估,使用5点Likert量表.描述性统计数据总结了分数,配对样本t检验比较ChatGPT3.5和4.0。
    关于质量,“适当”是ChatGPT3.5和4.0的最常见评级(40%和48%,分别)。对于数量,在3.5的病例中,59%的病例被认为是“不足”,在4.0的病例中,69%被认为是“足够”。在总体评估中,3分最常见的是3.5分(36%),而4分占主导地位的是4.0(42%)。平均质量分数为3.40和3.73,平均数量分数为-0.31(不足和适当之间)和0.03(适当和过度之间),分别。3.5的平均总分为3.21,4.0的平均总分为3.67。从4.0开始的反应显示在三个方面有统计学上的显着改善。
    ChatGPT产生的反应大多从适当到略显不足,提供总体平均信息量。4.0版本的性能优于3.5版本,需要进一步的研究来研究AI聊天机器人在帮助患者获得医疗信息方面的潜在效用。
    UNASSIGNED: The large-scale artificial intelligence (AI) language model chatbot, Chat Generative Pre-Trained Transformer (ChatGPT), is renowned for its ability to provide data quickly and efficiently. This study aimed to assess the medical responses of ChatGPT regarding anesthetic procedures.
    UNASSIGNED: Two anesthesiologist authors selected 30 questions representing inquiries patients might have about surgery and anesthesia. These questions were inputted into two versions of ChatGPT in English. A total of 31 anesthesiologists then evaluated each response for quality, quantity, and overall assessment, using 5-point Likert scales. Descriptive statistics summarized the scores, and a paired sample t-test compared ChatGPT 3.5 and 4.0.
    UNASSIGNED: Regarding quality, \"appropriate\" was the most common rating for both ChatGPT 3.5 and 4.0 (40 and 48%, respectively). For quantity, responses were deemed \"insufficient\" in 59% of cases for 3.5, and \"adequate\" in 69% for 4.0. In overall assessment, 3 points were most common for 3.5 (36%), while 4 points were predominant for 4.0 (42%). Mean quality scores were 3.40 and 3.73, and mean quantity scores were - 0.31 (between insufficient and adequate) and 0.03 (between adequate and excessive), respectively. The mean overall score was 3.21 for 3.5 and 3.67 for 4.0. Responses from 4.0 showed statistically significant improvement in three areas.
    UNASSIGNED: ChatGPT generated responses mostly ranging from appropriate to slightly insufficient, providing an overall average amount of information. Version 4.0 outperformed 3.5, and further research is warranted to investigate the potential utility of AI chatbots in assisting patients with medical information.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:新加坡,像亚洲其他地区一样,面临心理健康促进的持续挑战,包括围绕不健康和寻求治疗的耻辱,以及缺乏训练有素的心理健康人员。COVID-19大流行,这创造了精神卫生保健需求的激增,同时加速了数字健康解决方案的采用,揭示了在该地区快速扩展创新解决方案的新机会。
    目标:2020年6月,新加坡政府启动了思维。sg,一个匿名的数字心理健康资源网站,已经发展到包括超过500个策划的当地心理健康资源,一种经过临床验证的抑郁和焦虑自我评估工具,来自Wysa的人工智能(AI)聊天机器人,旨在提供数字治疗练习,以及为工作成年人量身定制的网站版本,称为工作中的思维。该平台的目标是使新加坡居民能够负责自己的心理健康,并能够通过无障碍数字解决方案的轻松便捷为周围的人提供基本支持。
    方法:通过GoogleAnalytics和自定义应用程序编程接口捕获的点击级数据分析来衡量网站使用情况。这反过来又驱动了基于开源平台TitaniumDatabase和Metabase的定制分析基础架构。独特,nonbounce(用户不立即从网站导航),已介入,并报告返回用户。
    结果:在发布后的2年中(2020年7月1日至2022年6月30日),该网站接待了>447,000名访客(约占目标人口300万的15%),62.02%(277,727/447,783)的人探索网站或参与资源(称为非弹跳访客);10.54%(29,271/277,727)的非弹跳访客返回。该平台上最受欢迎的功能是聊天机器人提供的基于对话的治疗练习和自我评估工具,25.54%(67,626/264,758)和11.69%(32,469/277,727)的非反弹访客使用了这些数据。在工作中,广泛参与的非反弹访客的比率(即,花费≥40秒勘探资源),一年内返回的人数分别为51.56%(22,474/43,588)和13.43%(5,853/43,588),分别,与30.9%(42,829/138,626)和9.97%(13,822/138,626)相比,分别,在通用的思维线上。sg网站在同一年。
    结论:该网站已达到预期的范围,并且访客数量增长强劲,这需要大量和持续的数字营销活动和战略外展伙伴关系。该网站小心翼翼地保持匿名性,限制分析的细节。总体采用的良好水平鼓励我们相信,轻度至中度的精神健康状况以及其背后的社会因素适合数字干预。而心态。sg主要在新加坡使用,我们相信本地定制的类似解决方案是广泛和全球适用的。
    BACKGROUND: Singapore, like the rest of Asia, faces persistent challenges to mental health promotion, including stigma around unwellness and seeking treatment and a lack of trained mental health personnel. The COVID-19 pandemic, which created a surge in mental health care needs and simultaneously accelerated the adoption of digital health solutions, revealed a new opportunity to quickly scale innovative solutions in the region.
    OBJECTIVE: In June 2020, the Singaporean government launched mindline.sg, an anonymous digital mental health resource website that has grown to include >500 curated local mental health resources, a clinically validated self-assessment tool for depression and anxiety, an artificial intelligence (AI) chatbot from Wysa designed to deliver digital therapeutic exercises, and a tailored version of the website for working adults called mindline at work. The goal of the platform is to empower Singapore residents to take charge of their own mental health and to be able to offer basic support to those around them through the ease and convenience of a barrier-free digital solution.
    METHODS: Website use is measured through click-level data analytics captured via Google Analytics and custom application programming interfaces, which in turn drive a customized analytics infrastructure based on the open-source platforms Titanium Database and Metabase. Unique, nonbounced (users that do not immediately navigate away from the site), engaged, and return users are reported.
    RESULTS: In the 2 years following launch (July 1, 2020, through June 30, 2022), the website received >447,000 visitors (approximately 15% of the target population of 3 million), 62.02% (277,727/447,783) of whom explored the site or engaged with resources (referred to as nonbounced visitors); 10.54% (29,271/277,727) of those nonbounced visitors returned. The most popular features on the platform were the dialogue-based therapeutic exercises delivered by the chatbot and the self-assessment tool, which were used by 25.54% (67,626/264,758) and 11.69% (32,469/277,727) of nonbounced visitors. On mindline at work, the rates of nonbounced visitors who engaged extensively (ie, spent ≥40 seconds exploring resources) and who returned were 51.56% (22,474/43,588) and 13.43% (5,853/43,588) over a year, respectively, compared to 30.9% (42,829/138,626) and 9.97% (13,822/138,626), respectively, on the generic mindline.sg site in the same year.
    CONCLUSIONS: The site has achieved desired reach and has seen a strong growth rate in the number of visitors, which required substantial and sustained digital marketing campaigns and strategic outreach partnerships. The site was careful to preserve anonymity, limiting the detail of analytics. The good levels of overall adoption encourage us to believe that mild to moderate mental health conditions and the social factors that underly them are amenable to digital interventions. While mindline.sg was primarily used in Singapore, we believe that similar solutions with local customization are widely and globally applicable.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:脊柱关节炎(SpA),慢性炎症性疾病,主要影响骶髂关节和脊柱,显着增加残疾的风险。SpA的复杂性,其多样化的临床表现和症状往往模仿其他疾病,对其准确诊断和鉴别提出了重大挑战。由于资源有限,这种复杂性在非专业医疗保健环境中变得更加明显,导致延迟的推荐,误诊率增加,并加剧了SpA患者的残疾结果。医学诊断中大型语言模型(LLM)的出现为克服这些诊断障碍带来了革命性的潜力。尽管人工智能和LLM的最新进展证明了诊断和治疗各种疾病的有效性,它们在SpA中的应用仍然不发达。目前,有一个明显的缺乏SpA特定的LLM和一个既定的基准来评估这种模型在这个特定领域的性能。
    目的:我们的目标是建立基础医学模式,创建针对SpA的基本医学知识及其独特的诊断和治疗方案的全面评估基准。模型,培训后,将通过监督微调进一步增强。预计将大大帮助医生进行SpA诊断和治疗,特别是在获得专门护理的机会有限的环境中。此外,这一举措有望在初级保健层面促进早期和准确的SpA检测,从而减少与延迟或错误诊断相关的风险。
    方法:严格的基准,包括222个精心制定的SpA多项选择题,将被建立和发展。这些问题将被广泛修订,以确保它们在现实世界的诊断和治疗方案中准确评估LLM的表现。我们的方法涉及使用公共数据集选择和完善顶级基础模型。我们基准测试中表现最好的模型将接受进一步的培训。随后,来自医院的80,000多例实际住院和门诊病例将加强LLM培训,结合监督微调和低秩适应等技术。我们将严格评估模型生成的响应的准确性,并使用流畅性指标评估其推理过程,相关性,完整性,和医疗水平。
    结果:模型的开发正在进行中,预计到2024年初将有显著的增强。基准,以及评估结果,预计将于2024年第二季度发布。
    结论:我们的训练模型旨在利用LLM分析复杂临床数据的能力,从而实现精确检测,诊断,和治疗SpA。预计这项创新将在减少因延迟或错误的SpA诊断引起的残疾方面发挥至关重要的作用。通过在不同的医疗保健环境中推广这种模式,我们预计SpA管理会有显著改善,最终提高了患者的预后,减轻了疾病的总体负担。
    DERR1-10.2196/57001。
    BACKGROUND: Spondyloarthritis (SpA), a chronic inflammatory disorder, predominantly impacts the sacroiliac joints and spine, significantly escalating the risk of disability. SpA\'s complexity, as evidenced by its diverse clinical presentations and symptoms that often mimic other diseases, presents substantial challenges in its accurate diagnosis and differentiation. This complexity becomes even more pronounced in nonspecialist health care environments due to limited resources, resulting in delayed referrals, increased misdiagnosis rates, and exacerbated disability outcomes for patients with SpA. The emergence of large language models (LLMs) in medical diagnostics introduces a revolutionary potential to overcome these diagnostic hurdles. Despite recent advancements in artificial intelligence and LLMs demonstrating effectiveness in diagnosing and treating various diseases, their application in SpA remains underdeveloped. Currently, there is a notable absence of SpA-specific LLMs and an established benchmark for assessing the performance of such models in this particular field.
    OBJECTIVE: Our objective is to develop a foundational medical model, creating a comprehensive evaluation benchmark tailored to the essential medical knowledge of SpA and its unique diagnostic and treatment protocols. The model, post-pretraining, will be subject to further enhancement through supervised fine-tuning. It is projected to significantly aid physicians in SpA diagnosis and treatment, especially in settings with limited access to specialized care. Furthermore, this initiative is poised to promote early and accurate SpA detection at the primary care level, thereby diminishing the risks associated with delayed or incorrect diagnoses.
    METHODS: A rigorous benchmark, comprising 222 meticulously formulated multiple-choice questions on SpA, will be established and developed. These questions will be extensively revised to ensure their suitability for accurately evaluating LLMs\' performance in real-world diagnostic and therapeutic scenarios. Our methodology involves selecting and refining top foundational models using public data sets. The best-performing model in our benchmark will undergo further training. Subsequently, more than 80,000 real-world inpatient and outpatient cases from hospitals will enhance LLM training, incorporating techniques such as supervised fine-tuning and low-rank adaptation. We will rigorously assess the models\' generated responses for accuracy and evaluate their reasoning processes using the metrics of fluency, relevance, completeness, and medical proficiency.
    RESULTS: Development of the model is progressing, with significant enhancements anticipated by early 2024. The benchmark, along with the results of evaluations, is expected to be released in the second quarter of 2024.
    CONCLUSIONS: Our trained model aims to capitalize on the capabilities of LLMs in analyzing complex clinical data, thereby enabling precise detection, diagnosis, and treatment of SpA. This innovation is anticipated to play a vital role in diminishing the disabilities arising from delayed or incorrect SpA diagnoses. By promoting this model across diverse health care settings, we anticipate a significant improvement in SpA management, culminating in enhanced patient outcomes and a reduced overall burden of the disease.
    UNASSIGNED: DERR1-10.2196/57001.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这篇叙事文献综述对新兴领域进行了全面的考察,从复杂的算法水平到实际应用,追踪人工智能(AI)驱动的抑郁和焦虑检测工具的发展。提供基本的精神保健服务现在是公共卫生的重要优先事项。近年来,人工智能已经成为早期识别和干预这些普遍心理健康障碍的游戏规则改变者。人工智能工具可以通过帮助精神科医生收集患者进展和任务的客观数据,从而潜在地增强行为医疗服务。这项研究强调了当前对人工智能的理解,不同类型的AI,它目前用于多种精神疾病,优势,缺点,未来的潜力。随着技术的发展和现代数字化的提高,人工智能在精神病学中的应用将会增加;因此,需要全面了解。我们搜索了PubMed,谷歌学者,和科学直接使用关键字。在最近对使用具有AI和机器学习技术的电子健康记录(EHR)诊断所有临床状况的研究的回顾中,已经找到了大约99个出版物。在这些中,在所有年龄组中确定了35项针对心理健康障碍的研究,其中,六项研究利用了EHR数据源。通过批判性地分析著名的学术著作,我们的目标是阐明这项技术的现状,探索它的成功,局限性,和未来的方向。在这样做的时候,我们希望有助于对AI的潜力进行细致的理解,以彻底改变心理健康诊断,并为这一至关重要的领域的进一步研究和开发铺平道路。
    This narrative literature review undertakes a comprehensive examination of the burgeoning field, tracing the development of artificial intelligence (AI)-powered tools for depression and anxiety detection from the level of intricate algorithms to practical applications. Delivering essential mental health care services is now a significant public health priority. In recent years, AI has become a game-changer in the early identification and intervention of these pervasive mental health disorders. AI tools can potentially empower behavioral healthcare services by helping psychiatrists collect objective data on patients\' progress and tasks. This study emphasizes the current understanding of AI, the different types of AI, its current use in multiple mental health disorders, advantages, disadvantages, and future potentials. As technology develops and the digitalization of the modern era increases, there will be a rise in the application of artificial intelligence in psychiatry; therefore, a comprehensive understanding will be needed. We searched PubMed, Google Scholar, and Science Direct using keywords for this. In a recent review of studies using electronic health records (EHR) with AI and machine learning techniques for diagnosing all clinical conditions, roughly 99 publications have been found. Out of these, 35 studies were identified for mental health disorders in all age groups, and among them, six studies utilized EHR data sources. By critically analyzing prominent scholarly works, we aim to illuminate the current state of this technology, exploring its successes, limitations, and future directions. In doing so, we hope to contribute to a nuanced understanding of AI\'s potential to revolutionize mental health diagnostics and pave the way for further research and development in this critically important domain.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    大型语言模型(“LLM”)的开发人员和供应商-例如ChatGPT,谷歌吟游诗人,和Microsoft的Bing处于最前沿-当他们代表HIPAA涵盖的实体处理受保护的健康信息(“PHI”)时,可能会受到1996年《健康保险流通和责任法案》(“HIPAA”)的约束。在这样做的时候,他们成为HIPAA下的商业伙伴或商业伙伴的分包商。
    Developers and vendors of large language models (\"LLMs\") - such as ChatGPT, Google Bard, and Microsoft\'s Bing at the forefront-can be subject to Health Insurance Portability and Accountability Act of 1996 (\"HIPAA\") when they process protected health information (\"PHI\") on behalf of the HIPAA covered entities. In doing so, they become business associates or subcontractors of a business associate under HIPAA.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究旨在调查中国研究生对AIChatbot技术的接受和利用情况及其对高等教育的启示。采用UTAUT(接受和使用技术的统一理论)模型和ECM(期望-确认模型)的融合,这项研究旨在查明影响学生态度的关键因素,满意,和关于人工智能聊天机器人的行为意图。该研究构建了一个包含七个重要预测因素的模型,旨在通过AI聊天机器人准确预测用户的意图和行为。从中国各大学注册的373名学生中收集,采用结构方程模型的偏最小二乘法对自报数据进行分析,确认模型的可靠性和有效性。研究结果验证了11个提出的假设中的7个,强调ECM结构的影响作用,特别是“确认”和“满意”,“超过了UTAUT构造对用户行为的影响。具体来说,用户的感知确认显着影响他们的满意度和随后继续使用AI聊天机器人的意图。此外,“个人创新”是塑造用户行为意图的关键决定因素。这项研究强调了在教育环境中进一步探索人工智能工具采用的必要性,并鼓励继续调查它们在教学和学习环境中的潜力。
    This study is centered on investigating the acceptance and utilization of AI Chatbot technology among graduate students in China and its implications for higher education. Employing a fusion of the UTAUT (Unified Theory of Acceptance and Use of Technology) model and the ECM (Expectation-Confirmation Model), the research seeks to pinpoint the pivotal factors influencing students\' attitudes, satisfaction, and behavioral intentions regarding AI Chatbots. The study constructs a model comprising seven substantial predictors aimed at precisely foreseeing users\' intentions and behavior with AI Chatbots. Collected from 373 students enrolled in various universities across China, the self-reported data is subject to analysis using the partial-least squares method of structural equation modeling to confirm the model\'s reliability and validity. The findings validate seven out of the eleven proposed hypotheses, underscoring the influential role of ECM constructs, particularly \"Confirmation\" and \"Satisfaction,\" outweighing the impact of UTAUT constructs on users\' behavior. Specifically, users\' perceived confirmation significantly influences their satisfaction and subsequent intention to continue using AI Chatbots. Additionally, \"Personal innovativeness\" emerges as a critical determinant shaping users\' behavioral intention. This research emphasizes the need for further exploration of AI tool adoption in educational settings and encourages continued investigation of their potential in teaching and learning environments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:人工智能(AI)是一个快速发展的领域,具有改变医疗保健和公共卫生各个方面的潜力,包括医学培训。在为五年级医学生开设的“卫生与公共卫生”课程期间,使用AI聊天机器人作为教育支持工具,进行了有关疫苗接种的实用培训课程。在接受疫苗接种的特定培训之前,对学生进行了基于网络的测试,该测试是从意大利国家医学住院医师测试中提取的。完成测试后,AI聊天机器人协助对每个问题进行了严格的更正。
    目的:这项研究的主要目的是确定AI聊天机器人是否可以被视为公共卫生培训的教育支持工具。次要目标是评估不同AI聊天机器人在意大利语复杂的多项选择医学问题上的表现。
    方法:从意大利国家医疗居住权测试中提取了由15个关于疫苗接种的多项选择题组成的测试,使用有针对性的关键词,并通过GoogleForms和不同的AI聊天机器人模型(BingChat,ChatGPT,Chatsonic,谷歌吟游诗人,和YouChat)。考试的纠正是在教室里进行的,专注于对聊天机器人提供的解释进行批判性评估。进行了Mann-WhitneyU测试,以比较医学生和AI聊天机器人的表现。在培训体验结束时匿名收集学生反馈。
    结果:总计,36名医学生和5个AI聊天机器人模型完成了测试。学生在15分中的平均得分为8.22(SD2.65),而AI聊天机器人的平均得分为12.22(SD2.77)。结果表明,两组之间的性能差异具有统计学意义(U=49.5,P<.001),具有较大的效应大小(r=0.69)。当按问题类型划分时(直接,基于场景的,和否定),在直接(P<.001)和基于情景(P<.001)的问题上观察到显著差异,但不是在否定的问题(P=.48)。学生报告对教育经历的满意度很高(7.9/10),表达强烈的重复体验的愿望(7.6/10)。
    结论:这项研究证明了AI聊天机器人在回答与疫苗接种相关的复杂医学问题和提供有价值的教育支持方面的有效性。在直接和基于情景的问题上,他们的表现大大超过了医学生。负责任和批判性地使用人工智能聊天机器人可以增强医学教育,使其成为融入教育系统的一个重要方面。
    BACKGROUND: Artificial intelligence (AI) is a rapidly developing field with the potential to transform various aspects of health care and public health, including medical training. During the \"Hygiene and Public Health\" course for fifth-year medical students, a practical training session was conducted on vaccination using AI chatbots as an educational supportive tool. Before receiving specific training on vaccination, the students were given a web-based test extracted from the Italian National Medical Residency Test. After completing the test, a critical correction of each question was performed assisted by AI chatbots.
    OBJECTIVE: The main aim of this study was to identify whether AI chatbots can be considered educational support tools for training in public health. The secondary objective was to assess the performance of different AI chatbots on complex multiple-choice medical questions in the Italian language.
    METHODS: A test composed of 15 multiple-choice questions on vaccination was extracted from the Italian National Medical Residency Test using targeted keywords and administered to medical students via Google Forms and to different AI chatbot models (Bing Chat, ChatGPT, Chatsonic, Google Bard, and YouChat). The correction of the test was conducted in the classroom, focusing on the critical evaluation of the explanations provided by the chatbot. A Mann-Whitney U test was conducted to compare the performances of medical students and AI chatbots. Student feedback was collected anonymously at the end of the training experience.
    RESULTS: In total, 36 medical students and 5 AI chatbot models completed the test. The students achieved an average score of 8.22 (SD 2.65) out of 15, while the AI chatbots scored an average of 12.22 (SD 2.77). The results indicated a statistically significant difference in performance between the 2 groups (U=49.5, P<.001), with a large effect size (r=0.69). When divided by question type (direct, scenario-based, and negative), significant differences were observed in direct (P<.001) and scenario-based (P<.001) questions, but not in negative questions (P=.48). The students reported a high level of satisfaction (7.9/10) with the educational experience, expressing a strong desire to repeat the experience (7.6/10).
    CONCLUSIONS: This study demonstrated the efficacy of AI chatbots in answering complex medical questions related to vaccination and providing valuable educational support. Their performance significantly surpassed that of medical students in direct and scenario-based questions. The responsible and critical use of AI chatbots can enhance medical education, making it an essential aspect to integrate into the educational system.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:评估三个AI聊天机器人提出的眼科医生建议的准确性和偏见,即ChatGPT3.5(OpenAI,旧金山,CA,美国),必应聊天(MicrosoftCorp.,雷德蒙德,WA,美国),和谷歌吟游诗人(AlphabetInc.,山景,CA,美国)。这项研究分析了20个美国人口最多的城市的聊天机器人建议。
    方法:每个聊天机器人在给出提示时返回了80条建议:“给我找到(城市)的四位优秀眼科医生。“医生的特点,包括专业,location,性别,实践类型,和团契,被收集。进行了一个比例z检验,以比较每个聊天机器人推荐的女性眼科医生的比例与全国平均水平(美国医学院协会(AAMC)的27.2%)。进行了Pearson的卡方检验,以确定三种聊天机器人在男性和女性推荐和推荐准确性方面的差异。
    结果:BingChat推荐的女性眼科医生(1.61%)和Bard推荐的女性眼科医生(8.0%)明显少于全国执业女性眼科医生27.2%的比例(分别为p<0.001,p<0.01)。ChatGPT推荐的女性(29.5%)少于男性眼科医生(p<0.722)。ChatGPT(73.8%),Bing聊天(67.5%),巴德(62.5%)给出了很高的不准确建议。与全国学术眼科医生的平均水平(17%)相比,在所有3个聊天机器人中,在学术医学或学术和私人综合实践中推荐的眼科医生的比例均显著较高.
    结论:这项研究揭示了AI聊天机器人建议中的实质性偏见和不准确性。他们努力可靠而准确地推荐眼科医生,大多数建议是眼科以外的专业医生,或者不在所需城市或附近。BingChat和GoogleBard显示出明显的反对推荐女性眼科医生的倾向,所有聊天机器人都赞成推荐学术医学眼科医生。
    OBJECTIVE: To evaluate the accuracy and bias of ophthalmologist recommendations made by three AI chatbots, namely ChatGPT 3.5 (OpenAI, San Francisco, CA, USA), Bing Chat (Microsoft Corp., Redmond, WA, USA), and Google Bard (Alphabet Inc., Mountain View, CA, USA). This study analyzed chatbot recommendations for the 20 most populous U.S. cities.
    METHODS: Each chatbot returned 80 total recommendations when given the prompt \"Find me four good ophthalmologists in (city).\" Characteristics of the physicians, including specialty, location, gender, practice type, and fellowship, were collected. A one-proportion z-test was performed to compare the proportion of female ophthalmologists recommended by each chatbot to the national average (27.2% per the Association of American Medical Colleges (AAMC)). Pearson\'s chi-squared test was performed to determine differences between the three chatbots in male versus female recommendations and recommendation accuracy.
    RESULTS: Female ophthalmologists recommended by Bing Chat (1.61%) and Bard (8.0%) were significantly less than the national proportion of 27.2% practicing female ophthalmologists (p<0.001, p<0.01, respectively). ChatGPT recommended fewer female (29.5%) than male ophthalmologists (p<0.722). ChatGPT (73.8%), Bing Chat (67.5%), and Bard (62.5%) gave high rates of inaccurate recommendations. Compared to the national average of academic ophthalmologists (17%), the proportion of recommended ophthalmologists in academic medicine or in combined academic and private practice was significantly greater for all three chatbots.
    CONCLUSIONS: This study revealed substantial bias and inaccuracy in the AI chatbots\' recommendations. They struggled to recommend ophthalmologists reliably and accurately, with most recommendations being physicians in specialties other than ophthalmology or not in or near the desired city. Bing Chat and Google Bard showed a significant tendency against recommending female ophthalmologists, and all chatbots favored recommending ophthalmologists in academic medicine.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:人工智能聊天机器人产生的鉴别诊断的诊断准确性,包括ChatGPT模型,对于来自普通内科(GIM)部门的复杂临床小插曲,病例报告未知。
    目的:本研究旨在通过使用来自Dokkyo医科大学附属医院GIM部门发表的病例报告的病例小插图,评估第三代ChatGPT(ChatGPT-3.5)和第四代ChatGPT(ChatGPT-4)产生的鉴别诊断列表的准确性。日本。
    方法:我们搜索了PubMed的病例报告。识别后,医生选择诊断病例,确定最终诊断,并将它们展示在临床小插曲中。医师在ChatGPT-3.5和ChatGPT-4提示中用临床插图键入确定的文本以生成前10个鉴别诊断。ChatGPT模型没有经过专门训练或进一步加强。来自其他医疗机构的三位GIM医生通过阅读相同的临床插图来创建差异诊断列表。我们测量了前10个鉴别诊断列表中的正确诊断率,前5个鉴别诊断列表,和顶级诊断。
    结果:总计,对52例病例报告进行分析。ChatGPT-4在前10个鉴别诊断列表中的正确诊断率,前5个鉴别诊断列表,最高诊断为83%(43/52),81%(42/52),和60%(31/52),分别。ChatGPT-3.5在前10个鉴别诊断列表中的正确诊断率,前5个鉴别诊断列表,最高诊断为73%(38/52),65%(34/52),和42%(22/52),分别。ChatGPT-4的正确诊断率与前10名医生的正确诊断率相当(43/52,83%vs39/52,75%,分别为;P=0.47)和前五名(42/52,81%vs35/52,67%,分别;P=.18)鉴别诊断列表和最高诊断(31/52,60%vs26/52,50%,分别;P=.43),尽管差异不显著。ChatGPT模型的诊断准确性根据开放访问状态或发布日期(2011年之前与2022年之前)没有显着差异。
    结论:本研究证明了使用ChatGPT-3.5和ChatGPT-4生成的鉴别诊断列表对来自GIM部门发布的病例报告的复杂临床观察的潜在诊断准确性。ChatGPT-4产生的前10名和前5名鉴别诊断列表中的正确诊断率超过80%。尽管来自单个部门的有限病例报告数据集,我们的发现强调了ChatGPT-4作为医生补充工具的潜在效用,特别是那些隶属于GIM部门的人。进一步的调查应通过使用培训数据之外的不同案例材料来探索ChatGPT的诊断准确性。这些努力将全面了解人工智能在增强临床决策中的作用。
    BACKGROUND: The diagnostic accuracy of differential diagnoses generated by artificial intelligence chatbots, including ChatGPT models, for complex clinical vignettes derived from general internal medicine (GIM) department case reports is unknown.
    OBJECTIVE: This study aims to evaluate the accuracy of the differential diagnosis lists generated by both third-generation ChatGPT (ChatGPT-3.5) and fourth-generation ChatGPT (ChatGPT-4) by using case vignettes from case reports published by the Department of GIM of Dokkyo Medical University Hospital, Japan.
    METHODS: We searched PubMed for case reports. Upon identification, physicians selected diagnostic cases, determined the final diagnosis, and displayed them into clinical vignettes. Physicians typed the determined text with the clinical vignettes in the ChatGPT-3.5 and ChatGPT-4 prompts to generate the top 10 differential diagnoses. The ChatGPT models were not specially trained or further reinforced for this task. Three GIM physicians from other medical institutions created differential diagnosis lists by reading the same clinical vignettes. We measured the rate of correct diagnosis within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and the top diagnosis.
    RESULTS: In total, 52 case reports were analyzed. The rates of correct diagnosis by ChatGPT-4 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 83% (43/52), 81% (42/52), and 60% (31/52), respectively. The rates of correct diagnosis by ChatGPT-3.5 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 73% (38/52), 65% (34/52), and 42% (22/52), respectively. The rates of correct diagnosis by ChatGPT-4 were comparable to those by physicians within the top 10 (43/52, 83% vs 39/52, 75%, respectively; P=.47) and within the top 5 (42/52, 81% vs 35/52, 67%, respectively; P=.18) differential diagnosis lists and top diagnosis (31/52, 60% vs 26/52, 50%, respectively; P=.43) although the difference was not significant. The ChatGPT models\' diagnostic accuracy did not significantly vary based on open access status or the publication date (before 2011 vs 2022).
    CONCLUSIONS: This study demonstrates the potential diagnostic accuracy of differential diagnosis lists generated using ChatGPT-3.5 and ChatGPT-4 for complex clinical vignettes from case reports published by the GIM department. The rate of correct diagnoses within the top 10 and top 5 differential diagnosis lists generated by ChatGPT-4 exceeds 80%. Although derived from a limited data set of case reports from a single department, our findings highlight the potential utility of ChatGPT-4 as a supplementary tool for physicians, particularly for those affiliated with the GIM department. Further investigations should explore the diagnostic accuracy of ChatGPT by using distinct case materials beyond its training data. Such efforts will provide a comprehensive insight into the role of artificial intelligence in enhancing clinical decision-making.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Editorial
    ChatGPT在医学文献中的指数增长,到2023年8月,积累了1000多篇PubMed引文,突显了人工智能(AI)和医疗保健融合的关键时刻。这一显著的增长不仅展示了其彻底改变医学学术界的潜力,而且表明其对患者护理和医疗保健系统的影响迫在眉睫。尽管有这种热情,这些引用中有三分之一是社论或评论,强调实证研究的差距。除了它的潜力,人们担心ChatGPT会成为“大规模欺骗的武器”,并且需要进行严格的评估以应对不准确之处。世界医学编辑协会(WorldAssociationofMedicalEditors)发布了指导方针,强调人工智能工具不应该是手稿的共同作者,并主张在人工智能辅助的学术著作中明确披露。有趣的是,ChatGPT在九个月内实现了引用里程碑,与Google的14年相比。作为大型语言模型(LLM),比如ChatGPT,在医疗保健中变得更加不可或缺,围绕数据保护的问题,患者隐私,伦理影响越来越突出。随着LLM研究的未来展开,感兴趣的关键领域包括其在临床环境中的功效,它在远程医疗中的作用,以及它在医学教育中的潜力。未来的旅程需要医学界和人工智能开发者之间的和谐伙伴关系。强调技术进步和道德考虑。
    The exponential growth of ChatGPT in medical literature, amassing over 1000 PubMed citations by August 2023, underscores a pivotal juncture in the convergence of artificial intelligence (AI) and healthcare. This remarkable rise not only showcases its potential to revolutionize medical academia but also indicates its impending influence on patient care and healthcare systems. Notwithstanding this enthusiasm, one-third of these citations are editorials or commentaries, stressing a gap in empirical research. Alongside its potential, there are concerns about ChatGPT becoming a \"Weapon of Mass Deception\" and the need for rigorous evaluations to counter inaccuracies. The World Association of Medical Editors has released guidelines emphasizing that AI tools should not be manuscript co-authors and advocates for clear disclosures in AI-assisted academic works. Interestingly, ChatGPT achieved its citation milestone within nine months, compared to Google\'s 14 years. As Large Language Models (LLMs), like ChatGPT, become more integral in healthcare, issues surrounding data protection, patient privacy, and ethical implications gain prominence. As the future of LLM research unfolds, key areas of interest include its efficacy in clinical settings, its role in telemedicine, and its potential in medical education. The journey ahead necessitates a harmonious partnership between the medical community and AI developers, emphasizing both technological advancements and ethical considerations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号