在大型语言模型中预测下一个句子（而不是单词）：什么模型 - 大脑对齐告诉我们有关话语理解的信息。Predicting the next sentence (not word) in large language models: What model-brain alignment tells us about discourse comprehension.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

Current large language models (LLMs) rely on word prediction as their backbone pretraining task. Although word prediction is an important mechanism underlying language processing, human language comprehension occurs at multiple levels, involving the integration of words and sentences to achieve a full understanding of discourse. This study models language comprehension by using the next sentence prediction (NSP) task to investigate mechanisms of discourse-level comprehension. We show that NSP pretraining enhanced a model\'s alignment with brain data especially in the right hemisphere and in the multiple demand network, highlighting the contributions of nonclassical language regions to high-level language understanding. Our results also suggest that NSP can enable the model to better capture human comprehension performance and to better encode contextual information. Our study demonstrates that the inclusion of diverse learning objectives in a model leads to more human-like representations, and investigating the neurocognitive plausibility of pretraining tasks in LLMs can shed light on outstanding questions in language neuroscience.

摘要：

当前的大型语言模型（LLM）依赖于单词预测作为其骨干预训练任务。尽管单词预测是语言处理的重要基础机制，人类语言理解发生在多个层面，涉及单词和句子的整合，以实现对话语的全面理解。本研究通过使用下一个句子预测（NSP）任务来研究语篇层次理解的机制，从而对语言理解进行建模。我们表明，NSP预训练增强了模型与大脑数据的一致性，特别是在右半球和多需求网络中，强调非古典语言区域对高级语言理解的贡献。我们的结果还表明，NSP可以使该模型更好地捕获人类的理解表现并更好地编码上下文信息。我们的研究表明，在模型中包含不同的学习目标会导致更多类似于人类的表征，研究LLM中预训练任务的神经认知合理性可以揭示语言神经科学中的突出问题。