关键词: AI-generated scientific content ChatGPT LLM NLP abstract abstracts academic research artificial intelligence extract extraction generation generative language model language models natural language processing plagiarism publication publications scientific research text textual

Mesh : Humans Cross-Sectional Studies Artificial Intelligence Research Research Personnel Language

来  源:   DOI:10.2196/51229   PDF(Pubmed)

Abstract:
ChatGPT may act as a research assistant to help organize the direction of thinking and summarize research findings. However, few studies have examined the quality, similarity (abstracts being similar to the original one), and accuracy of the abstracts generated by ChatGPT when researchers provide full-text basic research papers.
We aimed to assess the applicability of an artificial intelligence (AI) model in generating abstracts for basic preclinical research.
We selected 30 basic research papers from Nature, Genome Biology, and Biological Psychiatry. Excluding abstracts, we inputted the full text into ChatPDF, an application of a language model based on ChatGPT, and we prompted it to generate abstracts with the same style as used in the original papers. A total of 8 experts were invited to evaluate the quality of these abstracts (based on a Likert scale of 0-10) and identify which abstracts were generated by ChatPDF, using a blind approach. These abstracts were also evaluated for their similarity to the original abstracts and the accuracy of the AI content.
The quality of ChatGPT-generated abstracts was lower than that of the actual abstracts (10-point Likert scale: mean 4.72, SD 2.09 vs mean 8.09, SD 1.03; P<.001). The difference in quality was significant in the unstructured format (mean difference -4.33; 95% CI -4.79 to -3.86; P<.001) but minimal in the 4-subheading structured format (mean difference -2.33; 95% CI -2.79 to -1.86). Among the 30 ChatGPT-generated abstracts, 3 showed wrong conclusions, and 10 were identified as AI content. The mean percentage of similarity between the original and the generated abstracts was not high (2.10%-4.40%). The blinded reviewers achieved a 93% (224/240) accuracy rate in guessing which abstracts were written using ChatGPT.
Using ChatGPT to generate a scientific abstract may not lead to issues of similarity when using real full texts written by humans. However, the quality of the ChatGPT-generated abstracts was suboptimal, and their accuracy was not 100%.
摘要:
背景:ChatGPT可能充当研究助手,以帮助组织思考方向并总结研究成果。然而,很少有研究检查质量,相似性(摘要与原始摘要相似),以及研究人员提供全文基础研究论文时ChatGPT生成的摘要的准确性。
目的:我们旨在评估人工智能(AI)模型在生成基础临床前研究摘要中的适用性。
方法:我们选择了《自然》杂志的30篇基础研究论文,基因组生物学,和生物精神病学。不包括摘要,我们将全文输入到ChatPDF中,基于ChatGPT的语言模型的应用,我们提示它生成与原始论文中使用的相同样式的摘要。总共邀请了8位专家来评估这些摘要的质量(基于0-10的李克特量表),并确定哪些摘要是由ChatPDF生成的,使用盲目的方法。还评估了这些摘要与原始摘要的相似性以及AI内容的准确性。
结果:ChatGPT生成的摘要质量低于实际摘要的质量(10分Likert量表:平均值4.72,SD2.09与平均值8.09,SD1.03;P<.001)。非结构化格式的质量差异显着(平均差-4.33;95%CI-4.79至-3.86;P<.001),但在4小标题结构化格式中最小(平均差-2.33;95%CI-2.79至-1.86)。在30份ChatGPT生成的摘要中,3显示错误的结论,和10个被确定为AI内容。原始摘要和生成摘要之间的平均相似性百分比不高(2.10%-4.40%)。蒙蔽的审阅者在猜测使用ChatGPT编写的摘要时达到了93%(224/240)的准确率。
结论:使用ChatGPT生成科学摘要可能不会导致使用人类编写的真实全文时的相似性问题。然而,ChatGPT生成的摘要的质量次优,他们的准确率不是100%。
公众号