尽管有很长的强化延迟，但时间尺度不变的偶然性还是会产生一次强化学习。Time-scale invariant contingency yields one-shot reinforcement learning despite extremely long delays to reinforcement.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

Reinforcement learning inspires much theorizing in neuroscience, cognitive science, machine learning, and AI. A central question concerns the conditions that produce the perception of a contingency between an action and reinforcement-the assignment-of-credit problem. Contemporary models of associative and reinforcement learning do not leverage the temporal metrics (measured intervals). Our information-theoretic approach formalizes contingency by time-scale invariant temporal mutual information. It predicts that learning may proceed rapidly even with extremely long action-reinforcer delays. We show that rats can learn an action after a single reinforcement, even with a 16-min delay between the action and reinforcement (15-fold longer than any delay previously shown to support such learning). By leveraging metric temporal information, our solution obviates the need for windows of associability, exponentially decaying eligibility traces, microstimuli, or distributions over Bayesian belief states. Its three equations have no free parameters; they predict one-shot learning without iterative simulation.

摘要：

强化学习激发了神经科学中的许多理论化，认知科学,机器学习,和AI。一个核心问题涉及在行动和强化之间产生偶然性的条件-信贷分配问题。当代的联想和强化学习模型不利用时间度量（测量间隔）。我们的信息理论方法通过时间尺度不变的时间互信息形式化了偶然性。它预测，即使有极长的行动强化延迟，学习也可能会迅速进行。我们证明老鼠可以在一次强化后学习一个动作，即使在动作和强化之间有16分钟的延迟（比以前显示的支持这种学习的任何延迟都长15倍）。通过利用度量时间信息，我们的解决方案消除了对可关联窗口的需要，呈指数级衰减的资格痕迹，微刺激,或贝叶斯信念状态的分布。它的三个方程没有自由参数;他们预测一次学习没有迭代模拟。