对抗审稿人的疲劳还是扩大偏见？在学术同行评审中使用 ChatGPT 和其他大型语言模型的考虑和建议。Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other large language models in scholarly peer review.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

BACKGROUND: The emergence of systems based on large language models (LLMs) such as OpenAI\'s ChatGPT has created a range of discussions in scholarly circles. Since LLMs generate grammatically correct and mostly relevant (yet sometimes outright wrong, irrelevant or biased) outputs in response to provided prompts, using them in various writing tasks including writing peer review reports could result in improved productivity. Given the significance of peer reviews in the existing scholarly publication landscape, exploring challenges and opportunities of using LLMs in peer review seems urgent. After the generation of the first scholarly outputs with LLMs, we anticipate that peer review reports too would be generated with the help of these systems. However, there are currently no guidelines on how these systems should be used in review tasks.
METHODS: To investigate the potential impact of using LLMs on the peer review process, we used five core themes within discussions about peer review suggested by Tennant and Ross-Hellauer. These include 1) reviewers\' role, 2) editors\' role, 3) functions and quality of peer reviews, 4) reproducibility, and 5) the social and epistemic functions of peer reviews. We provide a small-scale exploration of ChatGPT\'s performance regarding identified issues.
RESULTS: LLMs have the potential to substantially alter the role of both peer reviewers and editors. Through supporting both actors in efficiently writing constructive reports or decision letters, LLMs can facilitate higher quality review and address issues of review shortage. However, the fundamental opacity of LLMs\' training data, inner workings, data handling, and development processes raise concerns about potential biases, confidentiality and the reproducibility of review reports. Additionally, as editorial work has a prominent function in defining and shaping epistemic communities, as well as negotiating normative frameworks within such communities, partly outsourcing this work to LLMs might have unforeseen consequences for social and epistemic relations within academia. Regarding performance, we identified major enhancements in a short period and expect LLMs to continue developing.
CONCLUSIONS: We believe that LLMs are likely to have a profound impact on academia and scholarly communication. While potentially beneficial to the scholarly communication system, many uncertainties remain and their use is not without risks. In particular, concerns about the amplification of existing biases and inequalities in access to appropriate infrastructure warrant further attention. For the moment, we recommend that if LLMs are used to write scholarly reviews and decision letters, reviewers and editors should disclose their use and accept full responsibility for data security and confidentiality, and their reports\' accuracy, tone, reasoning and originality.

摘要：

背景：基于大型语言模型（LLM）的系统的出现，例如OpenAI的ChatGPT，在学术界引起了一系列讨论。由于LLM生成的语法正确且大部分相关(但有时完全错误，不相关或有偏见)响应提供的提示的输出，在各种写作任务中使用它们，包括撰写同行评审报告，可以提高生产率。鉴于同行评议在现有学术出版物领域的重要性，探索在同行评审中使用LLM的挑战和机遇似乎很紧迫。在使用LLM产生第一批学术产出之后，我们预计同行评审报告也将在这些系统的帮助下生成。然而,目前没有关于如何在审查任务中使用这些系统的准则。
方法：为了调查使用LLM对同行评审过程的潜在影响，我们在Tennant和Ross-Hellauer建议的同行评审讨论中使用了五个核心主题。其中包括1）审阅者\'角色，2)编辑角色，3)同行评审的功能和质量，4)再现性，5)同行评议的社会和认知功能。我们提供了ChatGPT关于已发现问题的性能的小规模探索。
结果：LLM有可能大幅改变同行评审者和编辑的角色。通过支持双方有效撰写建设性报告或决定信，LLM可以促进更高质量的审查，并解决审查短缺的问题。然而,LLM训练数据的基本不透明度，内部工作,数据处理，开发过程引起了人们对潜在偏见的担忧，审查报告的保密性和可重复性。此外,由于编辑工作在定义和塑造认知社区方面具有突出的功能，以及在这些社区内谈判规范框架，将这项工作部分外包给LLM可能会对学术界的社会和认知关系产生不可预见的后果。关于性能，我们在短时间内确定了主要的增强功能，并期望LLM继续发展。
结论：我们相信法学硕士很可能对学术界和学术交流产生深远的影响。虽然对学术交流系统有潜在的好处，许多不确定性仍然存在，它们的使用并非没有风险。特别是,对现有偏见和不平等扩大在获得适当基础设施方面的担忧值得进一步关注。目前,我们建议，如果LLM被用来写学术评论和决定信，审稿人和编辑应披露他们的使用情况，并对数据安全和保密承担全部责任，和他们的报告的准确性，tone,推理和独创性。