Mesh : Reward Dopaminergic Neurons / physiology metabolism Dopamine / metabolism Animals Algorithms Learning / physiology Models, Neurological Humans Reinforcement, Psychology Brain / physiology metabolism Neuronal Plasticity / physiology Time Factors

来  源:   DOI:10.1038/s41467-024-50205-3   PDF(Pubmed)

Abstract:
The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference learning (TD) learning, whereby certain units signal reward prediction errors (RPE). The TD algorithm has been traditionally mapped onto the dopaminergic system, as firing properties of dopamine neurons can resemble RPEs. However, certain predictions of TD learning are inconsistent with experimental results, and previous implementations of the algorithm have made unscalable assumptions regarding stimulus-specific fixed temporal bases. We propose an alternate framework to describe dopamine signaling in the brain, FLEX (Flexibly Learned Errors in Expected Reward). In FLEX, dopamine release is similar, but not identical to RPE, leading to predictions that contrast to those of TD. While FLEX itself is a general theoretical framework, we describe a specific, biophysically plausible implementation, the results of which are consistent with a preponderance of both existing and reanalyzed experimental data.
摘要:
解释大脑强化学习的主要理论框架是时间差异学习(TD)学习,因此,某些单位发出奖励预测误差(RPE)的信号。TD算法传统上被映射到多巴胺能系统上,因为多巴胺神经元的放电特性可以类似于RPE。然而,TD学习的某些预测与实验结果不一致,和以前的算法实现已经做出了关于刺激特定的固定时间基础的不可扩展的假设。我们提出了一个替代框架来描述大脑中的多巴胺信号,FLEX(在预期奖励中灵活学习的错误)。在FLEX,多巴胺释放是相似的,但不等同于RPE,导致与TD相反的预测。虽然FLEX本身是一个一般的理论框架,我们描述了一个具体的,生物物理上合理的实施,其结果与现有和重新分析的实验数据的优势一致。
公众号