Large language model

大型语言模型
  • 文章类型: Journal Article
    凭借其无与伦比的解释和参与人类语言和语境的能力,大型语言模型(LLM)暗示了连接人工智能和人类认知过程的潜力。这篇综述探讨了LLM的当前应用,比如ChatGPT,在精神病学领域。
    我们遵循PRISMA指南,并通过PubMed搜索,Embase,WebofScience,还有Scopus,直到2024年3月。
    从771篇检索到的文章中,我们纳入了16项直接检查LLM在精神病学中的使用情况。LLM,特别是ChatGPT和GPT-4,在临床推理中显示出不同的应用,社交媒体,和精神病学的教育。他们可以帮助诊断心理健康问题,管理抑郁症,评估自杀风险,并支持该领域的教育。然而,我们的审查还指出了它们的局限性,例如复杂病例的困难和对自杀风险的潜在低估。
    精神病学的早期研究揭示了LLM的多功能应用,从诊断支持到教育角色。鉴于发展速度很快,未来的调查准备探索这些模型可能在多大程度上重新定义精神卫生保健中的传统角色。
    UNASSIGNED: With their unmatched ability to interpret and engage with human language and context, large language models (LLMs) hint at the potential to bridge AI and human cognitive processes. This review explores the current application of LLMs, such as ChatGPT, in the field of psychiatry.
    UNASSIGNED: We followed PRISMA guidelines and searched through PubMed, Embase, Web of Science, and Scopus, up until March 2024.
    UNASSIGNED: From 771 retrieved articles, we included 16 that directly examine LLMs\' use in psychiatry. LLMs, particularly ChatGPT and GPT-4, showed diverse applications in clinical reasoning, social media, and education within psychiatry. They can assist in diagnosing mental health issues, managing depression, evaluating suicide risk, and supporting education in the field. However, our review also points out their limitations, such as difficulties with complex cases and potential underestimation of suicide risks.
    UNASSIGNED: Early research in psychiatry reveals LLMs\' versatile applications, from diagnostic support to educational roles. Given the rapid pace of advancement, future investigations are poised to explore the extent to which these models might redefine traditional roles in mental health care.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:去年,世界见证了大型语言模型(LLM)的采用。尽管使用LLM开发的产品有可能解决医疗保健中的可及性和效率问题,缺乏开发医疗保健LLM的可用指南,尤其是医学教育。
    目的:本研究的目的是确定并优先考虑为医学教育开发成功的LLM的推动者。我们进一步评估了这些确定的推动者之间的关系。
    方法:首先对现有文献进行叙述性回顾,以确定LLM开发的关键推动者。我们还收集了LLM用户的意见,以使用层次分析法(AHP)确定这些推动者的相对重要性,这是一种多准则决策方法。Further,总体解释结构模型(TISM)用于分析产品开发人员的观点,并确定这些推动者之间的关系和层次结构.最后,应用于分类(MICMAC)方法的基于交叉影响矩阵的乘法用于确定这些推动者的相对驱动和依赖能力。非概率目的抽样方法用于招募焦点小组。
    结果:AHP证明了LLM最重要的推动因素是可信度,优先级权重为0.37,其次是问责制(0.27642)和公平性(0.10572)。相比之下,可用性,优先级权重为0.04,显示出微不足道的重要性。TISM的结果与AHP的结果一致。专家观点和用户偏好评估之间唯一显著的区别是,产品开发人员指出,成本作为潜在的推动者最不重要。MICMAC分析表明,成本对其他促成因素有很大影响。焦点小组的输入被认为是可靠的,稠度比小于0.1(0.084)。
    结论:这项研究首次确定,优先考虑,并分析有效医学教育LLM的推动者之间的关系。根据这项研究的结果,我们开发了一个可理解的规范框架,名为CUC-FATE(成本,可用性,可信度,公平,问责制,透明度,和可解释性),用于评估医学教育中LLM的推动者。这项研究结果对医疗保健专业人员很有用,健康技术专家,医疗技术监管机构,和政策制定者。
    BACKGROUND: The world has witnessed increased adoption of large language models (LLMs) in the last year. Although the products developed using LLMs have the potential to solve accessibility and efficiency problems in health care, there is a lack of available guidelines for developing LLMs for health care, especially for medical education.
    OBJECTIVE: The aim of this study was to identify and prioritize the enablers for developing successful LLMs for medical education. We further evaluated the relationships among these identified enablers.
    METHODS: A narrative review of the extant literature was first performed to identify the key enablers for LLM development. We additionally gathered the opinions of LLM users to determine the relative importance of these enablers using an analytical hierarchy process (AHP), which is a multicriteria decision-making method. Further, total interpretive structural modeling (TISM) was used to analyze the perspectives of product developers and ascertain the relationships and hierarchy among these enablers. Finally, the cross-impact matrix-based multiplication applied to a classification (MICMAC) approach was used to determine the relative driving and dependence powers of these enablers. A nonprobabilistic purposive sampling approach was used for recruitment of focus groups.
    RESULTS: The AHP demonstrated that the most important enabler for LLMs was credibility, with a priority weight of 0.37, followed by accountability (0.27642) and fairness (0.10572). In contrast, usability, with a priority weight of 0.04, showed negligible importance. The results of TISM concurred with the findings of the AHP. The only striking difference between expert perspectives and user preference evaluation was that the product developers indicated that cost has the least importance as a potential enabler. The MICMAC analysis suggested that cost has a strong influence on other enablers. The inputs of the focus group were found to be reliable, with a consistency ratio less than 0.1 (0.084).
    CONCLUSIONS: This study is the first to identify, prioritize, and analyze the relationships of enablers of effective LLMs for medical education. Based on the results of this study, we developed a comprehendible prescriptive framework, named CUC-FATE (Cost, Usability, Credibility, Fairness, Accountability, Transparency, and Explainability), for evaluating the enablers of LLMs in medical education. The study findings are useful for health care professionals, health technology experts, medical technology regulators, and policy makers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:生物医学文献的迅速扩展对传统的综述方法提出了挑战,特别是在新出现的传染病爆发期间,快速行动至关重要。我们的研究旨在探索ChatGPT自动化生物医学文献综述以快速发现药物的潜力。
    方法:我们引入了一种新型的自动化管道,帮助识别特定病毒的药物,以应对未来潜在的全球健康威胁。我们的方法可用于选择PubMed文章,以确定给定病毒的药物靶标。我们在两种已知的病原体上测试了我们的方法:SARS-CoV-2,文献很多,还有尼帕,文献稀疏的地方。具体来说,由三名专家组成的小组审查了一组PubMed文章,并将其标记为描述给定病毒的药物靶标或不描述。自动化管道也被赋予了同样的任务,其性能取决于它是否将文章标记为类似于人类专家。我们应用了许多即时工程技术来提高ChatGPT的性能。
    结果:我们的最佳配置使用OpenAI的GPT-4,并实现了样本外验证性能,SARS-CoV-2的准确性/F1评分/敏感性/特异性为92.87%/88.43%/83.38%/97.82%,Nipah的为87.40%/73.90%/74.72%/91.36%。
    结论:这些结果突出了ChatGPT在药物发现和开发中的实用性,并揭示了它们在大流行级别的卫生紧急情况下能够快速识别药物靶标的潜力。
    OBJECTIVE: The rapid expansion of the biomedical literature challenges traditional review methods, especially during outbreaks of emerging infectious diseases when quick action is critical. Our study aims to explore the potential of ChatGPT to automate the biomedical literature review for rapid drug discovery.
    METHODS: We introduce a novel automated pipeline helping to identify drugs for a given virus in response to a potential future global health threat. Our approach can be used to select PubMed articles identifying a drug target for the given virus. We tested our approach on two known pathogens: SARS-CoV-2, where the literature is vast, and Nipah, where the literature is sparse. Specifically, a panel of three experts reviewed a set of PubMed articles and labeled them as either describing a drug target for the given virus or not. The same task was given to the automated pipeline and its performance was based on whether it labeled the articles similarly to the human experts. We applied a number of prompt engineering techniques to improve the performance of ChatGPT.
    RESULTS: Our best configuration used GPT-4 by OpenAI and achieved an out-of-sample validation performance with accuracy/F1-score/sensitivity/specificity of 92.87%/88.43%/83.38%/97.82% for SARS-CoV-2 and 87.40%/73.90%/74.72%/91.36% for Nipah.
    CONCLUSIONS: These results highlight the utility of ChatGPT in drug discovery and development and reveal their potential to enable rapid drug target identification during a pandemic-level health emergency.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:人工智能(AI),更具体地说,大型语言模型(LLM),通过优化临床工作流程和提高决策质量,在彻底改变急诊护理提供方面具有巨大潜力。尽管将LLM整合到急诊医学(EM)中的热情正在增长,现有文献的特点是不同的个体研究集合,概念分析,和初步实施。鉴于这些复杂性和理解上的差距,需要一个有凝聚力的框架来理解现有的关于在EM中应用LLM的知识体系。
    目标:鉴于缺乏全面的框架来探索LLM在EM中的作用,本范围审查旨在系统地绘制有关EM中LLM的潜在应用的现有文献,并确定未来研究的方向。解决这一差距将有助于在实地取得知情进展。
    方法:使用PRISMA-ScR(系统审查的首选报告项目和范围审查的荟萃分析扩展)标准,我们搜索了OvidMEDLINE,Embase,WebofScience,和谷歌学者在2018年1月至2023年8月之间发表的论文中讨论了LLM在EM中的使用。我们排除了其他形式的AI。总共筛选了1994年的独特标题和摘要,每篇全文由2名作者独立审查。数据是独立提取的,5位作者对数据进行了定量和定性的协同合成。
    结果:共纳入43篇论文。研究主要从2022年到2023年,在美国和中国进行。我们发现了四个主要主题:(1)临床决策和支持被强调为关键领域,LLM在加强患者护理方面发挥着重要作用,特别是通过它们在实时分诊中的应用,允许早期识别患者的紧迫性;(2)效率,工作流,和信息管理证明了LLM显著提高运营效率的能力,特别是通过病人记录合成的自动化,这可以减轻行政负担,加强以患者为中心的护理;(3)风险,伦理,透明度被确定为关注领域,特别是关于LLM输出的可靠性,具体研究强调了在潜在有缺陷的训练数据集中确保无偏见决策的挑战,强调彻底验证和道德监督的重要性;(4)教育和沟通的可能性包括法学硕士丰富医学培训的能力,例如通过使用增强沟通技巧的模拟患者互动。
    结论:LLM有可能从根本上改变EM,加强临床决策,优化工作流,改善患者预后。这篇综述通过确定关键研究领域为未来的进步奠定了基础:LLM应用的前瞻性验证,建立负责任使用的标准,理解提供者和患者的看法,提高医生的人工智能素养。有效地将LLM集成到EM中需要协作努力和全面评估,以确保这些技术能够安全有效地应用。
    BACKGROUND: Artificial intelligence (AI), more specifically large language models (LLMs), holds significant potential in revolutionizing emergency care delivery by optimizing clinical workflows and enhancing the quality of decision-making. Although enthusiasm for integrating LLMs into emergency medicine (EM) is growing, the existing literature is characterized by a disparate collection of individual studies, conceptual analyses, and preliminary implementations. Given these complexities and gaps in understanding, a cohesive framework is needed to comprehend the existing body of knowledge on the application of LLMs in EM.
    OBJECTIVE: Given the absence of a comprehensive framework for exploring the roles of LLMs in EM, this scoping review aims to systematically map the existing literature on LLMs\' potential applications within EM and identify directions for future research. Addressing this gap will allow for informed advancements in the field.
    METHODS: Using PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) criteria, we searched Ovid MEDLINE, Embase, Web of Science, and Google Scholar for papers published between January 2018 and August 2023 that discussed LLMs\' use in EM. We excluded other forms of AI. A total of 1994 unique titles and abstracts were screened, and each full-text paper was independently reviewed by 2 authors. Data were abstracted independently, and 5 authors performed a collaborative quantitative and qualitative synthesis of the data.
    RESULTS: A total of 43 papers were included. Studies were predominantly from 2022 to 2023 and conducted in the United States and China. We uncovered four major themes: (1) clinical decision-making and support was highlighted as a pivotal area, with LLMs playing a substantial role in enhancing patient care, notably through their application in real-time triage, allowing early recognition of patient urgency; (2) efficiency, workflow, and information management demonstrated the capacity of LLMs to significantly boost operational efficiency, particularly through the automation of patient record synthesis, which could reduce administrative burden and enhance patient-centric care; (3) risks, ethics, and transparency were identified as areas of concern, especially regarding the reliability of LLMs\' outputs, and specific studies highlighted the challenges of ensuring unbiased decision-making amidst potentially flawed training data sets, stressing the importance of thorough validation and ethical oversight; and (4) education and communication possibilities included LLMs\' capacity to enrich medical training, such as through using simulated patient interactions that enhance communication skills.
    CONCLUSIONS: LLMs have the potential to fundamentally transform EM, enhancing clinical decision-making, optimizing workflows, and improving patient outcomes. This review sets the stage for future advancements by identifying key research areas: prospective validation of LLM applications, establishing standards for responsible use, understanding provider and patient perceptions, and improving physicians\' AI literacy. Effective integration of LLMs into EM will require collaborative efforts and thorough evaluation to ensure these technologies can be safely and effectively applied.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Preprint
    2022年11月发布的聊天生成预训练转换器(ChatGPT)引起了公众对大型语言模型(LLM)的关注和学术兴趣,促进许多其他创新LLM的出现。这些LLM已应用于各个领域,包括医疗保健。已经进行了许多研究,关于如何在与健康相关的场景中使用最先进的LLM来帮助患者,医生,和公共卫生管理人员。
    这篇综述旨在总结在医疗保健中应用对话式LLM的应用和关注点,并为未来在医疗保健中的LLM研究提供议程。
    我们利用PubMed,ACM,和IEEE数字图书馆作为这篇评论的主要来源。我们遵循系统评价和荟萃分析(PRIMSA)首选报告项目的指导,筛选和选择同行评审的研究文章(1)与医疗保健应用和对话LLM相关,(2)在9月1日之前发表,2023年,我们开始收集和筛选纸张的日期。我们调查了这些论文,并根据其应用和关注点对其进行了分类。
    我们的搜索最初根据目标关键词确定了820篇论文,其中65篇论文符合我们的标准,并被纳入审查。最受欢迎的对话LLM是来自OpenAI的ChatGPT(60),紧随其后的是谷歌的巴德(1),来自Meta(1)的大型语言模型元AI(LLaMA),和其他LLM(5)。这些论文根据其应用分为四类:1)总结,2)医学知识查询,3)预测,4)行政管理,和四类关注点:1)可靠性,2)偏见,3)隐私,4)公众接受度。有49(75%)的研究论文使用LLM进行总结和/或医学知识查询,58篇(89%)研究论文表达了对可靠性和/或偏见的担忧。我们发现会话LLM在总结方面表现出可喜的结果,并以相对较高的准确性为患者提供医学知识。然而,像ChatGPT这样的对话LLM无法为需要专业领域专业知识的复杂健康相关任务提供可靠的答案。此外,在我们审查的论文中,没有进行任何实验来仔细研究对话LLM如何导致医疗保健研究中的偏见或隐私问题。
    未来的研究应侧重于提高LLM应用在复杂的健康相关任务中的可靠性,以及调查LLM应用程序如何带来偏见和隐私问题的机制。考虑到LLM的巨大可访问性,legal,社会,和技术努力都需要解决对LLM的担忧,改进,规范LLM在医疗保健中的应用。
    UNASSIGNED: The launch of the Chat Generative Pre-trained Transformer (ChatGPT) in November 2022 has attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including healthcare. Numerous studies have since been conducted regarding how to employ state-of-the-art LLMs in health-related scenarios to assist patients, doctors, and public health administrators.
    UNASSIGNED: This review aims to summarize the applications and concerns of applying conversational LLMs in healthcare and provide an agenda for future research on LLMs in healthcare.
    UNASSIGNED: We utilized PubMed, ACM, and IEEE digital libraries as primary sources for this review. We followed the guidance of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRIMSA) to screen and select peer-reviewed research articles that (1) were related to both healthcare applications and conversational LLMs and (2) were published before September 1st, 2023, the date when we started paper collection and screening. We investigated these papers and classified them according to their applications and concerns.
    UNASSIGNED: Our search initially identified 820 papers according to targeted keywords, out of which 65 papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT from OpenAI (60), followed by Bard from Google (1), Large Language Model Meta AI (LLaMA) from Meta (1), and other LLMs (5). These papers were classified into four categories in terms of their applications: 1) summarization, 2) medical knowledge inquiry, 3) prediction, and 4) administration, and four categories of concerns: 1) reliability, 2) bias, 3) privacy, and 4) public acceptability. There are 49 (75%) research papers using LLMs for summarization and/or medical knowledge inquiry, and 58 (89%) research papers expressing concerns about reliability and/or bias. We found that conversational LLMs exhibit promising results in summarization and providing medical knowledge to patients with a relatively high accuracy. However, conversational LLMs like ChatGPT are not able to provide reliable answers to complex health-related tasks that require specialized domain expertise. Additionally, no experiments in our reviewed papers have been conducted to thoughtfully examine how conversational LLMs lead to bias or privacy issues in healthcare research.
    UNASSIGNED: Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications brought bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in healthcare.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:本研究的目的是系统回顾ChatGPT的报告表现,确定潜在的限制,并探索其整合的未来方向,优化,以及放射学应用中的伦理考虑。
    方法:在对PubMed进行全面审查后,WebofScience,Embase,和谷歌学者数据库,截至2024年1月1日,我们确定了一组已发表的研究,利用ChatGPT进行临床放射学应用.
    结果:在得出的861项研究中,44项研究评估了ChatGPT的性能;其中,37(37/44;84.1%)表现出高性能,7人(7/44;15.9%)表示在提供诊断和临床决策支持(6/44;13.6%)以及患者沟通和教育内容(1/44;2.3%)方面表现较低.24项(24/44;54.5%)研究报告了ChatGPT表现的比例。其中,19项(19/24;79.2%)研究记录的中位准确率为70.5%,在五项(5/24;20.8%)研究中,ChatGPT结果与参考标准[放射科医师的决定或指南]的一致性中位数为83.6%,在这些研究中普遍证实了ChatGPT的高准确性。11项研究比较了两个最新的ChatGPT版本,十个(10/11;90.9%),ChatGPTv4的表现优于v3.5,在解决高阶思维问题方面表现出显著的增强,更好地理解放射学术语,并提高了描述图像的准确性。使用ChatGPT的风险和担忧包括有偏见的回应,独创性有限,以及不准确信息导致错误信息的可能性,幻觉,不当引用和虚假引用,网络安全漏洞,和患者隐私风险。
    结论:尽管在84.1%的放射学研究中显示了ChatGPT的有效性,仍然有许多陷阱和限制需要解决。现在确认其完整的熟练程度和准确性还为时过早,需要更广泛的多中心研究利用不同的数据集和预训练技术来验证ChatGPT在放射学中的作用。
    OBJECTIVE: The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications.
    METHODS: After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications.
    RESULTS: Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT\'s performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists\' decision or guidelines], generally confirming ChatGPT\'s high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks.
    CONCLUSIONS: Although ChatGPT\'s effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT\'s role in radiology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    ChatGPT在医疗领域具有广泛的应用。因此,这篇综述旨在根据ChatGPT在医学中的应用来定义关键问题并提供文献的全面视图。
    这个范围遵循Arksey和O\'Malley的五阶段框架。对出版物进行了全面的文献检索(2022年11月30日至2023年8月16日)。检索了六个数据库,并系统地编目了相关参考文献。注意力集中在文章的一般特征上,他们的应用领域,以及使用ChatGPT的优缺点。采用描述性统计和叙事综合法进行数据分析。
    在3426项研究中,247符合纳入本审查的标准。大多数文章(31.17%)来自美国。社论(43.32%)排名第一,其次是实验研究(11.74%)。ChatGPT在医学上的潜在应用是多种多样的,探索临床实践的研究数量最多(45.75%),包括协助临床决策支持和提供疾病信息和医疗建议。其次是医学教育(27.13%)和科学研究(16.19%)。在学科统计中特别值得注意的是放射学,手术和牙科在列表的顶部。然而,ChatGPT在医学上也面临着数据隐私的问题,不准确和抄袭。
    ChatGPT在医学中的应用侧重于不同的学科和一般的应用场景。ChatGPT具有自相矛盾的性质:它提供了显着的优势,但同时也引起了人们对其在医疗保健环境中的应用的极大关注。因此,必须制定理论框架,不仅解决其在医疗保健中的广泛使用,而且促进全面评估。此外,这些框架应有助于制定严格有效的准则和监管措施。
    UNASSIGNED: ChatGPT has a wide range of applications in the medical field. Therefore, this review aims to define the key issues and provide a comprehensive view of the literature based on the application of ChatGPT in medicine.
    UNASSIGNED: This scope follows Arksey and O\'Malley\'s five-stage framework. A comprehensive literature search of publications (30 November 2022 to 16 August 2023) was conducted. Six databases were searched and relevant references were systematically catalogued. Attention was focused on the general characteristics of the articles, their fields of application, and the advantages and disadvantages of using ChatGPT. Descriptive statistics and narrative synthesis methods were used for data analysis.
    UNASSIGNED: Of the 3426 studies, 247 met the criteria for inclusion in this review. The majority of articles (31.17%) were from the United States. Editorials (43.32%) ranked first, followed by experimental studys (11.74%). The potential applications of ChatGPT in medicine are varied, with the largest number of studies (45.75%) exploring clinical practice, including assisting with clinical decision support and providing disease information and medical advice. This was followed by medical education (27.13%) and scientific research (16.19%). Particularly noteworthy in the discipline statistics were radiology, surgery and dentistry at the top of the list. However, ChatGPT in medicine also faces issues of data privacy, inaccuracy and plagiarism.
    UNASSIGNED: The application of ChatGPT in medicine focuses on different disciplines and general application scenarios. ChatGPT has a paradoxical nature: it offers significant advantages, but at the same time raises great concerns about its application in healthcare settings. Therefore, it is imperative to develop theoretical frameworks that not only address its widespread use in healthcare but also facilitate a comprehensive assessment. In addition, these frameworks should contribute to the development of strict and effective guidelines and regulatory measures.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:像ChatGPT这样的大型语言模型(LLM)在医学领域的探索越来越多。然而,缺乏绩效评估的标准指南导致了方法上的不一致。本研究旨在总结评估ChatGPT在回答医学问题方面表现的现有证据,并为未来的研究提供方向。
    方法:于2023年6月15日在10个医学数据库中进行了广泛的文献检索。使用的关键字是\"ChatGPT,\"对出版物类型没有限制,语言,或日期。包括评估ChatGPT在回答医学问题方面的表现的研究。排除包括评论文章,注释,专利,ChatGPT的非医学评估,和印前研究。数据是根据一般研究特征提取的,问题来源,对话过程,评估指标,和ChatGPT的性能。通过整合精选文献的见解,提出了医学查询中LLM的评估框架。这项研究在PROSPERO注册,CRD42023456327。
    结果:共鉴定出3520篇文章,其中60篇在本文中进行了回顾和总结,17篇纳入了荟萃分析。ChatGPT显示总体综合准确率为56%(95%CI:51%-60%,I2=87%)解决医疗查询。然而,研究的问题资源各不相同,提问过程,和评估指标。根据我们提出的评估框架,许多研究未能报告方法细节,例如查询日期,ChatGPT的版本,和评分者之间的一致性。
    结论:这篇综述揭示了ChatGPT在解决医疗问题方面的潜力,但研究设计的异质性和报告不足可能会影响结果的可靠性。我们提出的评估框架为未来的研究设计和LLM在回答医学问题方面的透明报告提供了见解。
    Large language models (LLMs) such as ChatGPT are increasingly explored in medical domains. However, the absence of standard guidelines for performance evaluation has led to methodological inconsistencies. This study aims to summarize the available evidence on evaluating ChatGPT\'s performance in answering medical questions and provide direction for future research.
    An extensive literature search was conducted on June 15, 2023, across ten medical databases. The keyword used was \"ChatGPT,\" without restrictions on publication type, language, or date. Studies evaluating ChatGPT\'s performance in answering medical questions were included. Exclusions comprised review articles, comments, patents, non-medical evaluations of ChatGPT, and preprint studies. Data was extracted on general study characteristics, question sources, conversation processes, assessment metrics, and performance of ChatGPT. An evaluation framework for LLM in medical inquiries was proposed by integrating insights from selected literature. This study is registered with PROSPERO, CRD42023456327.
    A total of 3520 articles were identified, of which 60 were reviewed and summarized in this paper and 17 were included in the meta-analysis. ChatGPT displayed an overall integrated accuracy of 56 % (95 % CI: 51 %-60 %, I2 = 87 %) in addressing medical queries. However, the studies varied in question resource, question-asking process, and evaluation metrics. As per our proposed evaluation framework, many studies failed to report methodological details, such as the date of inquiry, version of ChatGPT, and inter-rater consistency.
    This review reveals ChatGPT\'s potential in addressing medical inquiries, but the heterogeneity of the study design and insufficient reporting might affect the results\' reliability. Our proposed evaluation framework provides insights for the future study design and transparent reporting of LLM in responding to medical questions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:在线患者评论对于指导寻求整形手术的个人至关重要,但是人工聊天机器人构成了传播虚假评论的威胁。这项研究旨在将真实的患者反馈与ChatGPT生成的美国前五名整形外科程序的评论进行比较。
    方法:30篇关于隆鼻的真实患者综述,眼睑成形术,整容,吸脂术,我们从RealSelf中收集了隆胸,并将其用作ChatGPT的模板,以生成匹配的患者评价。多产用户(n=30)评估了150对评论,以识别人类撰写和人工智能(AI)生成的评论。使用AI内容检测器软件(CopyleaksAI)进一步评估患者评价。
    结果:在9000个分类任务中,64.3%和35.7%的评论被归类为真假,分别。平均而言,在59.6%的案例中,作者(人与机器)被正确识别,这种糟糕的分类性能在所有程序中都是一致的。先前进行过美容治疗的患者比没有进行美容治疗的患者表现出较差的分类性能(p<0.05)。人类撰写的评论中的平均字符数明显高于AI生成的评论中的平均字符数(p<0.001),角色计数与参与者准确率之间存在显著相关性(p<0.001)。评论的情绪音色显着不同,“幸福”在人类书面评论中更为普遍(p<0.001),“失望”在人工智能评论中更为普遍(p=0.005)。CopyleaksAI正确分类了人类撰写和ChatGPT生成的评论的96.7%和69.3%,分别。
    结论:ChatGPT令人信服地复制了真实的患者评论,甚至欺骗商业AI检测软件。分析情绪语气和评论长度可以帮助区分真实和虚假的评论,强调需要教育患者和医生,以防止错误信息和不信任。
    Online patient reviews are crucial in guiding individuals who seek plastic surgery, but artificial chatbots pose a threat of disseminating fake reviews. This study aimed to compare real patient feedback with ChatGPT-generated reviews for the top five US plastic surgery procedures.
    Thirty real patient reviews on rhinoplasty, blepharoplasty, facelift, liposuction, and breast augmentation were collected from RealSelf and used as templates for ChatGPT to generate matching patient reviews. Prolific users (n = 30) assessed 150 pairs of reviews to identify human-written and artificial intelligence (AI)-generated reviews. Patient reviews were further assessed using AI content detector software (Copyleaks AI).
    Among the 9000 classification tasks, 64.3% and 35.7% of reviews were classified as authentic and fake, respectively. On an average, the author (human versus machine) was correctly identified in 59.6% of cases, and this poor classification performance was consistent across all procedures. Patients with prior aesthetic treatment showed poorer classification performance than those without (p < 0.05). The mean character count in human-written reviews was significantly higher (p < 0.001) that that in AI-generated reviews, with a significant correlation between character count and participants\' accuracy rate (p < 0.001). Emotional timbre of reviews differed significantly with \"happiness\" being more prevalent in human-written reviews (p < 0.001), and \"disappointment\" being more prevalent in AI reviews (p = 0.005). Copyleaks AI correctly classified 96.7% and 69.3% of human-written and ChatGPT-generated reviews, respectively.
    ChatGPT convincingly replicates authentic patient reviews, even deceiving commercial AI detection software. Analyzing emotional tone and review length can help differentiate real from fake reviews, underscoring the need to educate both patients and physicians to prevent misinformation and mistrust.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号