Chatbot

聊天机器人
  • 文章类型: Journal Article
    人工智能(AI)是指计算机系统执行通常需要人类智能的任务。人工智能在不断变化,正在彻底改变医疗保健领域,包括营养。这篇综述的目的有四个方面:(i)调查AI在营养研究中的作用;(ii)确定使用AI的营养领域;(iii)了解AI的未来潜在影响;(iv)调查有关AI在营养研究中使用的可能问题。搜索了八个数据库:PubMed,WebofScience,EBSCO,Agricola,Scopus,IEEE探索,谷歌学者和Cochrane。共检索到1737篇文章,其中22人被列入审查范围。文章筛选阶段包括重复消除,标题摘要选择,全文回顾,和质量评估。主要研究结果表明,人工智能在营养中的作用正处于发育阶段,主要关注饮食评估,较少关注营养不良预测,生活方式干预,和饮食相关疾病的理解。需要临床研究来确定AI的干预效果。人工智能使用的伦理,一个主要问题,仍未解决,需要考虑对某些人群进行附带损害预防。这篇综述中的异质性研究限制了对特定营养领域的关注。未来的研究应该优先考虑营养和节食方面的专业评论,以便更深入地了解人工智能在人类营养方面的潜力。
    Artificial intelligence (AI) refers to computer systems doing tasks that usually need human intelligence. AI is constantly changing and is revolutionizing the healthcare field, including nutrition. This review\'s purpose is four-fold: (i) to investigate AI\'s role in nutrition research; (ii) to identify areas in nutrition using AI; (iii) to understand AI\'s future potential impact; (iv) to investigate possible concerns about AI\'s use in nutrition research. Eight databases were searched: PubMed, Web of Science, EBSCO, Agricola, Scopus, IEEE Explore, Google Scholar and Cochrane. A total of 1737 articles were retrieved, of which 22 were included in the review. Article screening phases included duplicates elimination, title-abstract selection, full-text review, and quality assessment. The key findings indicated AI\'s role in nutrition is at a developmental stage, focusing mainly on dietary assessment and less on malnutrition prediction, lifestyle interventions, and diet-related diseases comprehension. Clinical research is needed to determine AI\'s intervention efficacy. The ethics of AI use, a main concern, remains unresolved and needs to be considered for collateral damage prevention to certain populations. The studies\' heterogeneity in this review limited the focus on specific nutritional areas. Future research should prioritize specialized reviews in nutrition and dieting for a deeper understanding of AI\'s potential in human nutrition.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目标:开发一种医疗保健聊天机器人服务(AI引导机器人),该服务使用大型语言模型进行实时对话,为患者提供准确的健康信息。
    方法:为了提供准确和专业的医疗反应,我们整合了一些癌症实践指南。集成元数据集的大小为117万个令牌。提取了集成和分类的元数据,转换成文本,分段为特定的字符长度,并使用嵌入模型进行矢量化。AI指南机器人是使用Python3.9实现的。为了增强可扩展性并合并集成数据集,我们将AI-guidebot与OpenAI和LangChain框架相结合。为了生成用户友好的对话,基于聊天生成预训练转换器(ChatGPT)开发了一种语言模型,由GPT-3.5提供动力的交互式会话聊天机器人。AI指南机器人是使用9月的ChatGPT3.5实现的。2023年至1月2024.
    结果:AI指南机器人允许用户选择他们想要的癌症类型和语言进行对话交互。AI引导的机器人旨在扩展其功能,以涵盖多种主要癌症类型。AI指南机器人响应的性能为90.98±4.02(通过总结李克特分数获得)。
    结论:AI-guidebot可以快速准确地向关心自己健康的癌症患者提供医疗信息。
    OBJECTIVE: To develop a healthcare chatbot service (AI-guided bot) that conducts real-time conversations using large language models to provide accurate health information to patients.
    METHODS: To provide accurate and specialized medical responses, we integrated several cancer practice guidelines. The size of the integrated meta-dataset was 1.17 million tokens. The integrated and classified metadata were extracted, transformed into text, segmented to specific character lengths, and vectorized using the embedding model. The AI-guide bot was implemented using Python 3.9. To enhance the scalability and incorporate the integrated dataset, we combined the AI-guide bot with OpenAI and the LangChain framework. To generate user-friendly conversations, a language model was developed based on Chat-Generative Pretrained Transformer (ChatGPT), an interactive conversational chatbot powered by GPT-3.5. The AI-guide bot was implemented using ChatGPT3.5 from Sep. 2023 to Jan. 2024.
    RESULTS: The AI-guide bot allowed users to select their desired cancer type and language for conversational interactions. The AI-guided bot was designed to expand its capabilities to encompass multiple major cancer types. The performance of the AI-guide bot responses was 90.98 ± 4.02 (obtained by summing up the Likert scores).
    CONCLUSIONS: The AI-guide bot can provide medical information quickly and accurately to patients with cancer who are concerned about their health.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    准确的医疗建议对于确保最佳的患者护理至关重要,和错误信息可能导致错误的决定,潜在的有害健康结果。诸如OpenAI的GPT-4之类的大型语言模型(LLM)的出现激发了人们对其潜在医疗保健应用的兴趣,特别是在自动医疗咨询中。然而,严格的调查将他们的表现与人类专家进行比较仍然很少。
    这项研究旨在将GPT-4的医疗准确性与使用真实世界用户生成的查询提供医疗建议的人类专家进行比较,特别关注心脏病学。它还试图分析GPT-4和人类专家在特定问题类别中的表现,包括药物或药物信息和初步诊断。
    我们通过互联网门户收集了来自一般用户的251对心脏病学特定问题和来自人类专家的回答。GPT-4的任务是生成对相同问题的响应。三名独立心脏病专家(SL,JHK,和JJC)评估了人类专家和GPT-4提供的答案。使用计算机接口,每个评估者比较了这些对,并确定哪个答案更优越,他们定量地测量了问题的清晰度和复杂性以及回答的准确性和适当性,应用三级分级量表(低,中等,和高)。此外,我们进行了语言分析,使用字数和类型-标记比比较回答的长度和词汇多样性.
    GPT-4和人类专家在医疗准确性方面表现出可比的功效(132/251的“GPT-4更好”为52.6%,而119/251的“人类专家更好”为47.4%)。在准确度等级分类中,人类的高准确度应答高于GPT-4(50/237,21.1%vs30/238,12.6%),但低准确度应答的比例也更高(11/237,4.6%vs1/238,0.4%;P=.001).与人类专家相比,GPT-4的反应通常更长,并且使用的词汇量较少,可能增强一般用户的可理解性(句子计数:平均10.9,SD4.2与平均5.9,SD3.7;P<.001;类型令牌比:平均0.69,SD0.07与平均0.79,SD0.09;P<.001)。然而,人类专家在特定问题类别中的表现优于GPT-4,特别是那些与药物或药物信息和初步诊断有关的信息。这些发现强调了GPT-4在根据临床经验提供建议方面的局限性。
    GPT-4在自动医疗咨询中显示出了有希望的潜力,具有与人类专家相当的医疗准确性。然而,挑战仍然存在,尤其是在微妙的临床判断领域。LLM的未来改进可能需要整合特定的临床推理途径和监管机构,以确保安全使用。需要进一步的研究来了解LLM在各种医疗专业和条件下的全部潜力。
    UNASSIGNED: Accurate medical advice is paramount in ensuring optimal patient care, and misinformation can lead to misguided decisions with potentially detrimental health outcomes. The emergence of large language models (LLMs) such as OpenAI\'s GPT-4 has spurred interest in their potential health care applications, particularly in automated medical consultation. Yet, rigorous investigations comparing their performance to human experts remain sparse.
    UNASSIGNED: This study aims to compare the medical accuracy of GPT-4 with human experts in providing medical advice using real-world user-generated queries, with a specific focus on cardiology. It also sought to analyze the performance of GPT-4 and human experts in specific question categories, including drug or medication information and preliminary diagnoses.
    UNASSIGNED: We collected 251 pairs of cardiology-specific questions from general users and answers from human experts via an internet portal. GPT-4 was tasked with generating responses to the same questions. Three independent cardiologists (SL, JHK, and JJC) evaluated the answers provided by both human experts and GPT-4. Using a computer interface, each evaluator compared the pairs and determined which answer was superior, and they quantitatively measured the clarity and complexity of the questions as well as the accuracy and appropriateness of the responses, applying a 3-tiered grading scale (low, medium, and high). Furthermore, a linguistic analysis was conducted to compare the length and vocabulary diversity of the responses using word count and type-token ratio.
    UNASSIGNED: GPT-4 and human experts displayed comparable efficacy in medical accuracy (\"GPT-4 is better\" at 132/251, 52.6% vs \"Human expert is better\" at 119/251, 47.4%). In accuracy level categorization, humans had more high-accuracy responses than GPT-4 (50/237, 21.1% vs 30/238, 12.6%) but also a greater proportion of low-accuracy responses (11/237, 4.6% vs 1/238, 0.4%; P=.001). GPT-4 responses were generally longer and used a less diverse vocabulary than those of human experts, potentially enhancing their comprehensibility for general users (sentence count: mean 10.9, SD 4.2 vs mean 5.9, SD 3.7; P<.001; type-token ratio: mean 0.69, SD 0.07 vs mean 0.79, SD 0.09; P<.001). Nevertheless, human experts outperformed GPT-4 in specific question categories, notably those related to drug or medication information and preliminary diagnoses. These findings highlight the limitations of GPT-4 in providing advice based on clinical experience.
    UNASSIGNED: GPT-4 has shown promising potential in automated medical consultation, with comparable medical accuracy to human experts. However, challenges remain particularly in the realm of nuanced clinical judgment. Future improvements in LLMs may require the integration of specific clinical reasoning pathways and regulatory oversight for safe use. Further research is needed to understand the full potential of LLMs across various medical specialties and conditions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    OpenAI开发的被大肆宣传的人工智能(AI)模型ChatGPT可以为医生带来巨大的好处。尤其是病理学家,通过节省时间,以便他们可以将时间用于更重要的工作。生成AI是一类特殊的AI模型,它使用从现有数据中学习的模式和结构,并可以创建新数据。在病理学中利用ChatGPT提供了许多好处,包括患者记录的总结及其在数字病理学中的有希望的前景,以及它对这一领域的教育和研究的宝贵贡献。然而,需要处理某些障碍,例如将ChatGPT与图像分析集成在一起,这将通过提高诊断的准确性和准确性来成为病理学领域的一场革命。使用ChatGPT的挑战包括来自其训练数据的偏见,需要充足的输入数据,与偏见和透明度相关的潜在风险,以及不准确的内容生成引起的潜在不利结果。从文本信息中生成有意义的见解,这将有效地处理不同类型的图像数据,比如医学图像,和病理幻灯片。应适当考虑道德和法律问题,包括偏见。
    The much-hyped artificial intelligence (AI) model called ChatGPT developed by Open AI can have great benefits for physicians, especially pathologists, by saving time so that they can use their time for more significant work. Generative AI is a special class of AI model, which uses patterns and structures learned from existing data and can create new data. Utilizing ChatGPT in Pathology offers a multitude of benefits, encompassing the summarization of patient records and its promising prospects in Digital Pathology, as well as its valuable contributions to education and research in this field. However, certain roadblocks need to be dealt like integrating ChatGPT with image analysis which will act as a revolution in the field of pathology by increasing diagnostic accuracy and precision. The challenges with the use of ChatGPT encompass biases from its training data, the need for ample input data, potential risks related to bias and transparency, and the potential adverse outcomes arising from inaccurate content generation. Generation of meaningful insights from the textual information which will be efficient in processing different types of image data, such as medical images, and pathology slides. Due consideration should be given to ethical and legal issues including bias.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:评估ChatGPT-4的能力,这是一种由人工智能(AI)驱动的自动化Chatbot,回答与前肩关节不稳定患者的Latarjet程序有关的常见患者问题,并将此性能与Google搜索引擎进行比较。
    方法:使用先前验证的方法,首先使用查询“Latarjet”进行了Google搜索。“随后,提取了十大常见问题(FAQ)和相关来源.然后,ChatGPT-4被提示提供关于该程序的十大常见问题和答案。重复此过程以识别需要离散数字答案的其他常见问题解答,以便在ChatGPT-4和Google之间进行比较。离散,随后,根据两名研究金训练的运动医学外科医生对搜索平台不知情的临床判断,对数字答案的准确性进行了评估.
    结果:ChatGPT-4对数字答案的平均(±标准偏差)准确度为2.9±0.9,而Google为2.5±1.4(p=0.65)。ChatGPT-4仅从学术来源获得答案的信息,这与谷歌搜索引擎(p=0.003)显著不同,仅使用30%的学术来源和网站来自个人外科医生(50%)和更大的医疗实践(20%)。对于一般常见问题,在比较ChatGPT-4和Google搜索引擎时,发现40%的常见问题解答是相同的。就用来回答这些问题的来源而言,ChatGPT-4再次使用了100%的学术资源,谷歌搜索引擎使用了60%的学术资源,20%的外科医生个人网站,和20%的医疗实践(p=0.087)。
    结论:ChatGPT-4证明了响应患者询问提供有关Latarjet程序的准确可靠信息的能力,在所有情况下使用多个学术来源。这与谷歌搜索引擎相反,更频繁地使用单外科医生和大型医疗实践网站。尽管为执行信息检索任务而访问的资源存在差异,ChatGPT-4和GoogleSearchEngine的临床相关性和所提供信息的准确性无显著差异.
    OBJECTIVE: To assess the ability for ChatGPT-4, an automated Chatbot powered by artificial intelligence (AI), to answer common patient questions concerning the Latarjet procedure for patients with anterior shoulder instability and compare this performance to Google Search Engine.
    METHODS: Using previously validated methods, a Google search was first performed using the query \"Latarjet.\" Subsequently, the top ten frequently asked questions (FAQs) and associated sources were extracted. ChatGPT-4 was then prompted to provide the top ten FAQs and answers concerning the procedure. This process was repeated to identify additional FAQs requiring discrete-numeric answers to allow for a comparison between ChatGPT-4 and Google. Discrete, numeric answers were subsequently assessed for accuracy based on the clinical judgement of two fellowship-trained sports medicine surgeons blinded to search platform.
    RESULTS: Mean (±standard deviation) accuracy to numeric-based answers were 2.9±0.9 for ChatGPT-4 versus 2.5±1.4 for Google (p=0.65). ChatGPT-4 derived information for answers only from academic sources, which was significantly different from Google Search Engine (p=0.003), which used only 30% academic sources and websites from individual surgeons (50%) and larger medical practices (20%). For general FAQs, 40% of FAQs were found to be identical when comparing ChatGPT-4 and Google Search Engine. In terms of sources used to answer these questions, ChatGPT-4 again used 100% academic resources, while Google Search Engine used 60% academic resources, 20% surgeon personal websites, and 20% medical practices (p=0.087).
    CONCLUSIONS: ChatGPT-4 demonstrated the ability to provide accurate and reliable information about the Latarjet procedure in response to patient queries, using multiple academic sources in all cases. This was in contrast to Google Search Engine, which more frequently used single surgeon and large medical practice websites. Despite differences in the resources accessed to perform information retrieval tasks, the clinical relevance and accuracy of information provided did not significantly differ between ChatGPT-4 and Google Search Engine.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:人工智能,特别是聊天机器人系统,正在成为医疗保健的工具,帮助临床决策和患者参与。
    目的:本研究旨在分析ChatGPT-3.5和ChatGPT-4在解决复杂的临床和伦理困境方面的表现,并说明他们在医疗保健决策中的潜在作用,同时比较老年人和居民的评级,和特定的问题类型。
    方法:共有4名专业医师提出了176个现实世界的临床问题。共有8位资深医生和居民以1-5的量表评估了GPT-3.5和GPT-4的5个类别的回答:准确性,相关性,清晰度,实用程序,和全面性。在内科进行评估,急诊医学,和道德。在全球范围内进行了比较,在老年人和居民之间,跨分类。
    结果:两种GPT模型均获得较高的平均得分(GPT-4为4.4,SD0.8,GPT-3.5为4.1,SD1.0)。GPT-4在所有评级维度上都优于GPT-3.5,老年人对这两种模式的反应始终高于居民。具体来说,老年人将GPT-4评为更有益和更完整(分别为4.6vs4.0和4.6vs4.1;P<.001),和GPT-3.5相似(分别为4.1vs3.7和3.9vs3.5;P<.001)。道德查询在这两种模型中都获得了最高的评价,平均分数反映了准确性和完整性标准的一致性。问题类型之间的区别是显著的,特别是对于整个紧急情况下的GPT-4完整性平均分数,内部,和伦理问题(分别为4.2,SD1.0;4.3,SD0.8;和4.5,SD0.7;P<.001),对于GPT-3.5的准确性,有益的,和完整性尺寸。
    结论:ChatGPT帮助医生解决医疗问题的潜力是有希望的,具有增强诊断能力的前景,治疗,和道德。虽然整合到临床工作流程可能很有价值,它必须补充,不替换,人类的专业知识。持续的研究对于确保在临床环境中安全有效的实施至关重要。
    BACKGROUND: Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision-making and patient engagement.
    OBJECTIVE: This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision-making while comparing seniors\' and residents\' ratings, and specific question types.
    METHODS: A total of 4 specialized physicians formulated 176 real-world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories: accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications.
    RESULTS: Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P<.001), and GPT-3.5 similarly (mean 4.1 vs 3.7 and 3.9 vs 3.5, respectively; P<.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for the GPT-4 mean scores in completeness across emergency, internal, and ethical questions (4.2, SD 1.0; 4.3, SD 0.8; and 4.5, SD 0.7, respectively; P<.001), and for GPT-3.5\'s accuracy, beneficial, and completeness dimensions.
    CONCLUSIONS: ChatGPT\'s potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:基于人工智能(AI)的聊天机器人在为代谢功能障碍相关的脂肪变性肝病(MASLD)患者提供咨询方面显示出了希望。虽然ChatGPT3.5展示了用英语全面回答MASLD相关问题的能力,它的准确性仍然是次优的。语言是否影响这些结果尚不清楚。本研究旨在评估ChatGPT作为意大利MASLD患者咨询工具的表现。
    方法:13位意大利专家对准确性进行了评估,ChatGPT3.5的完整性和可理解性在回答15个与MASLD相关的意大利语问题时使用六点精度,三点完整性和三点可理解性李克特量表。
    结果:准确性的平均得分,完整性和可理解性分别为4.57±0.42、2.14±0.31和2.91±0.07。身体活动领域在准确性和完整性方面取得了最高的平均分数,而专科转诊领域则最低。总的来说,Fleiss的准确度一致性系数,所有15个问题的完整性和可理解性分别为0.016,0.075和-0.010.评估者的年龄和学术角色不会影响分数。结果与我们先前以英语为重点的研究没有显着差异。
    结论:语言似乎不会影响ChatGPT为MASLD患者提供可理解和完整咨询的能力,但精度在某些领域仍然是次优的。
    BACKGROUND: Artificial intelligence (AI)-based chatbots have shown promise in providing counseling to patients with metabolic dysfunction-associated steatotic liver disease (MASLD). While ChatGPT3.5 has demonstrated the ability to comprehensively answer MASLD-related questions in English, its accuracy remains suboptimal. Whether language influences these results is unclear. This study aims to assess ChatGPT\'s performance as a counseling tool for Italian MASLD patients.
    METHODS: Thirteen Italian experts rated the accuracy, completeness and comprehensibility of ChatGPT3.5 in answering 15 MASLD-related questions in Italian using a six-point accuracy, three-point completeness and three-point comprehensibility Likert\'s scale.
    RESULTS: Mean scores for accuracy, completeness and comprehensibility were 4.57 ± 0.42, 2.14 ± 0.31 and 2.91 ± 0.07, respectively. The physical activity domain achieved the highest mean scores for accuracy and completeness, whereas the specialist referral domain achieved the lowest. Overall, Fleiss\'s coefficient of concordance for accuracy, completeness and comprehensibility across all 15 questions was 0.016, 0.075 and -0.010, respectively. Age and academic role of the evaluators did not influence the scores. The results were not significantly different from our previous study focusing on English.
    CONCLUSIONS: Language does not appear to affect ChatGPT\'s ability to provide comprehensible and complete counseling to MASLD patients, but accuracy remains suboptimal in certain domains.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    电子健康(eHealth)和移动健康(mHealth)可以以具有时间效率和成本效益的方式刺激身体活动(PA)。这项随机对照试验旨在研究针对50岁及以上成年人的基于计算机和移动的不同联合PA干预措施对中度至重度PA(MVPA)的影响。参与者(N=954)被随机分配到基本的现有基于计算机的干预(ActivePlus[AP]或IMove[IM]),并补充了三个移动元素之一(1)活动跟踪器(AT),(2)生态瞬时干预(EMI),或(3)聊天机器人(CB)或对照组(CG)。在基线(T0)通过SQUASH评估MVPA,3个月(T1),和6个月(T2),并在T0和T2通过加速度计。对于主要研究组(AP/IM+AT,AP/IM+EMI,AP/IM+CB)。亚组的初步MVPA发现(AP+AT,AP+EMI,AP+CB,IM+AT,IM+EMI,IMCB)与退出数据相结合,显示出具有集成AT的基于计算机的干预AP的潜力。基于这些初步发现,可以建议eHealth开发人员将AT与现有的基于计算机的PA干预措施集成。然而,由于亚组分析的探索性,建议进一步研究以证实这些发现.
    Electronic health (eHealth) and mobile health (mHealth) could stimulate physical activity (PA) in a time-efficient and cost-effective way. This randomized controlled trial aims to investigate effects on moderate-to-vigorous PA (MVPA) of different combined computer- and mobile-based PA interventions targeted at adults aged 50 years and over. Participants (N = 954) were randomly allocated to a basic existing computer-based intervention (Active Plus [AP] or I Move [IM]) supplemented with one of three mobile elements being (1) activity tracker (AT), (2) ecological momentary intervention (EMI), or (3) chatbot (CB) or a control group (CG). MVPA was assessed via the SQUASH at baseline (T0), 3 months (T1), and 6 months (T2) and via accelerometers at T0 and T2. No intervention effects were found on objective (p = .502) and subjective (p = .368) MVPA for main research groups (AP/IM + AT, AP/IM + EMI, AP/IM + CB). Preliminary MVPA findings for subgroups (AP + AT, AP + EMI, AP + CB, IM + AT, IM + EMI, IM + CB) combined with drop-out data showed potential for the computer-based intervention AP with an integrated AT. Based on these preliminary findings, eHealth developers can be recommended to integrate ATs with existing computer-based PA interventions. However, further research is recommended to confirm the findings as a result of the exploratory nature of the subgroup analyses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:由于人工智能(AI)的最新进展,语言模型应用程序可以生成逻辑文本输出,很难与人类写作区分开。ChatGPT(OpenAI)和Bard(随后更名为“Gemini”;GoogleAI)是使用不同的方法开发的,但是关于它们产生摘要的能力差异的研究很少。在脊柱外科领域使用AI撰写科学摘要是许多争论和争议的中心。
    目的:本研究的目的是评估由ChatGPT和Bard生成的结构化摘要与人类撰写的摘要在脊柱外科领域的可重复性。
    方法:总共,从7种著名期刊中随机选择60篇涉及脊柱部分的摘要,并用作ChatGPT和Bard输入语句,以根据提供的论文标题生成摘要。共174篇摘要,分为人类撰写的摘要,ChatGPT生成的摘要,和Bard生成的摘要,对期刊指南的结构化格式和内容的一致性进行了评估。使用iThenticate和ZeroGPT程序评估抄袭和AI输出的可能性,分别。脊柱领域共有8位评审员评估了30篇随机提取的摘要,以确定它们是由AI还是人类作者制作的。
    结果:ChatGPT摘要中符合期刊格式指南的摘要比例(34/60,56.6%)高于Bard产生的摘要(6/54,11.1%;P<.001)。然而,与ChatGPT摘要(30/60,50%;P<.001)相比,Bard摘要的字数符合期刊指南的比例更高(49/54,90.7%)。ChatGPT生成的摘要的相似性指数(20.7%)显著低于Bard生成的摘要(32.1%;P<.001)。AI检测程序预测,21.7%(13/60)的人类群体,ChatGPT组的63.3%(38/60),Bard组的87%(47/54)可能是由人工智能产生的,曲线下面积值为0.863(P<.001)。人类评审员的平均检出率为53.8%(SD11.2%),灵敏度为56.3%,特异性为48.4%。共有56.3%(63/112)的实际人类撰写的摘要和55.9%(62/128)的人工智能生成的摘要被认为是人类撰写的和人工智能生成的。分别。
    结论:ChatGPT和Bard都可以用来帮助编写摘要,但大多数人工智能生成的摘要目前被认为是不道德的,因为抄袭和人工智能检测率很高。ChatGPT生成的摘要在满足期刊格式指南方面似乎优于Bard生成的摘要。因为人类无法准确区分人类编写的摘要和人工智能程序产生的摘要,至关重要的是要特别谨慎,并检查使用AI程序的道德界限,包括ChatGPT和Bard.
    BACKGROUND: Due to recent advances in artificial intelligence (AI), language model applications can generate logical text output that is difficult to distinguish from human writing. ChatGPT (OpenAI) and Bard (subsequently rebranded as \"Gemini\"; Google AI) were developed using distinct approaches, but little has been studied about the difference in their capability to generate the abstract. The use of AI to write scientific abstracts in the field of spine surgery is the center of much debate and controversy.
    OBJECTIVE: The objective of this study is to assess the reproducibility of the structured abstracts generated by ChatGPT and Bard compared to human-written abstracts in the field of spine surgery.
    METHODS: In total, 60 abstracts dealing with spine sections were randomly selected from 7 reputable journals and used as ChatGPT and Bard input statements to generate abstracts based on supplied paper titles. A total of 174 abstracts, divided into human-written abstracts, ChatGPT-generated abstracts, and Bard-generated abstracts, were evaluated for compliance with the structured format of journal guidelines and consistency of content. The likelihood of plagiarism and AI output was assessed using the iThenticate and ZeroGPT programs, respectively. A total of 8 reviewers in the spinal field evaluated 30 randomly extracted abstracts to determine whether they were produced by AI or human authors.
    RESULTS: The proportion of abstracts that met journal formatting guidelines was greater among ChatGPT abstracts (34/60, 56.6%) compared with those generated by Bard (6/54, 11.1%; P<.001). However, a higher proportion of Bard abstracts (49/54, 90.7%) had word counts that met journal guidelines compared with ChatGPT abstracts (30/60, 50%; P<.001). The similarity index was significantly lower among ChatGPT-generated abstracts (20.7%) compared with Bard-generated abstracts (32.1%; P<.001). The AI-detection program predicted that 21.7% (13/60) of the human group, 63.3% (38/60) of the ChatGPT group, and 87% (47/54) of the Bard group were possibly generated by AI, with an area under the curve value of 0.863 (P<.001). The mean detection rate by human reviewers was 53.8% (SD 11.2%), achieving a sensitivity of 56.3% and a specificity of 48.4%. A total of 56.3% (63/112) of the actual human-written abstracts and 55.9% (62/128) of AI-generated abstracts were recognized as human-written and AI-generated by human reviewers, respectively.
    CONCLUSIONS: Both ChatGPT and Bard can be used to help write abstracts, but most AI-generated abstracts are currently considered unethical due to high plagiarism and AI-detection rates. ChatGPT-generated abstracts appear to be superior to Bard-generated abstracts in meeting journal formatting guidelines. Because humans are unable to accurately distinguish abstracts written by humans from those produced by AI programs, it is crucial to exercise special caution and examine the ethical boundaries of using AI programs, including ChatGPT and Bard.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号