关键词: blocking in-context learning large language models metalearning neural networks

来  源:   DOI:   PDF(Pubmed)

Abstract:
Human learning is sensitive to rule-like structure and the curriculum of examples used for training. In tasks governed by succinct rules, learning is more robust when related examples are blocked across trials, but in the absence of such rules, interleaving is more effective. To date, no neural model has simultaneously captured these seemingly contradictory effects. Here we show that this same tradeoff spontaneously emerges with \"in-context learning\" (ICL) both in neural networks trained with metalearning and in large language models (LLMs). ICL is the ability to learn new tasks \"in context\" - without weight changes - via an inner-loop algorithm implemented in activation dynamics. Experiments with pretrained LLMs and metalearning transformers show that ICL exhibits the blocking advantage demonstrated in humans on a task involving rule-like structure, and conversely, that concurrent in-weight learning reproduces the interleaving advantage observed in humans on tasks lacking such structure.
摘要:
人类学习对类似规则的结构和用于培训的示例课程很敏感。在由简洁规则管理的任务中,当相关示例在试验中被阻止时,学习更健壮,但是在没有这样的规则的情况下,交错更有效。迄今为止,没有一个神经模型能同时捕捉到这些看似矛盾的效应。在这里,我们表明,在经过元学习训练的神经网络和大型语言模型(LLM)中,“上下文学习”(ICL)都会自发出现相同的权衡。ICL是通过在激活动力学中实现的内部循环算法来学习“上下文中”新任务的能力-无需权重更改。预先训练的LLM和金属学习变压器的实验表明,ICL在涉及规则结构的任务中表现出人类表现出的阻塞优势,反过来,同时进行的权重学习再现了人类在缺乏这种结构的任务中观察到的交织优势。
公众号