以ChatGPT为代表的生成AI工具正在成为新的现实。这项研究的动机是“人工智能生成的内容可能表现出一种独特的行为,可以从科学文章中分离出来”。在这项研究中,我们展示了如何使用针对各种疾病和状况的即时工程手段来生成文章。然后,我们展示了我们如何在两个阶段测试这个前提,并证明其有效性。随后,我们介绍xFakeSci,一种新颖的学习算法,能够将ChatGPT生成的文章与科学家制作的出版物区分开来。使用从两个源驱动的网络模型来训练该算法。为了缓解过度拟合问题,我们结合了一个基于数据驱动的启发式算法的校准步骤,包括接近度和比率。具体来说,从总共3952件针对三种不同医疗状况的假物品中,该算法只使用100篇文章进行了训练,但是使用100条的折叠校准。至于分类步骤,每个条件使用300篇文章进行。实际的标签步骤是针对50个生成的文章和50个真实的PubMed摘要的相等组合进行的。该测试还跨越了2010年至2024年的出版期,涵盖了对三种不同疾病的研究:癌症,抑郁症,和老年痴呆症。Further,我们评估了xFakeSci算法与一些经典数据挖掘算法的准确性(例如,支持向量机,回归,和朴素贝叶斯)。xFakeSci算法获得了80%至94%的F1分数,优于常见的数据挖掘算法,其F1值在38%至52%之间。我们将明显的差异归因于校准和接近距离启发式的引入,这突显了这一充满希望的表现。的确,对ChatGPT产生的假科学的预测提出了相当大的挑战。尽管如此,xFakeSci算法的引入是打击假科学的重要一步。
Generative AI tools exemplified by ChatGPT are becoming a new reality. This study is motivated by the premise that \"AI generated content may exhibit a distinctive behavior that can be separated from scientific articles\". In this study, we show how articles can be generated using means of prompt engineering for various diseases and conditions. We then show how we tested this premise in two phases and prove its validity. Subsequently, we introduce xFakeSci, a novel learning algorithm, that is capable of distinguishing ChatGPT-generated articles from publications produced by scientists. The algorithm is trained using network models driven from both sources. To mitigate overfitting issues, we incorporated a calibration step that is built upon data-driven heuristics, including proximity and ratios. Specifically, from a total of a 3952 fake articles for three different medical conditions, the algorithm was trained using only 100 articles, but calibrated using folds of 100 articles. As for the classification step, it was performed using 300 articles per condition. The actual label steps took place against an equal mix of 50 generated articles and 50 authentic PubMed abstracts. The testing also spanned publication periods from 2010 to 2024 and encompassed research on three distinct diseases: cancer, depression, and Alzheimer\'s. Further, we evaluated the accuracy of the xFakeSci algorithm against some of the classical data mining algorithms (e.g., Support Vector Machines, Regression, and Naive Bayes). The xFakeSci algorithm achieved F1 scores ranging from 80 to 94%, outperforming common data mining algorithms, which scored F1 values between 38 and 52%. We attribute the noticeable difference to the introduction of calibration and a proximity distance heuristic, which underscores this promising performance. Indeed, the prediction of fake science generated by ChatGPT presents a considerable challenge. Nonetheless, the introduction of the xFakeSci algorithm is a significant step on the way to combating fake science.