关键词: exploration learning redundant reinforcement sensorimotor

Mesh : Humans Reinforcement, Psychology Learning / physiology Reward Movement / physiology Feedback Psychomotor Performance / physiology

来  源:   DOI:10.1098/rspb.2023.1475   PDF(Pubmed)

Abstract:
From a baby\'s babbling to a songbird practising a new tune, exploration is critical to motor learning. A hallmark of exploration is the emergence of random walk behaviour along solution manifolds, where successive motor actions are not independent but rather become serially dependent. Such exploratory random walk behaviour is ubiquitous across species\' neural firing, gait patterns and reaching behaviour. The past work has suggested that exploratory random walk behaviour arises from an accumulation of movement variability and a lack of error-based corrections. Here, we test a fundamentally different idea-that reinforcement-based processes regulate random walk behaviour to promote continual motor exploration to maximize success. Across three human reaching experiments, we manipulated the size of both the visually displayed target and an unseen reward zone, as well as the probability of reinforcement feedback. Our empirical and modelling results parsimoniously support the notion that exploratory random walk behaviour emerges by utilizing knowledge of movement variability to update intended reach aim towards recently reinforced motor actions. This mechanism leads to active and continuous exploration of the solution manifold, currently thought by prominent theories to arise passively. The ability to continually explore muscle, joint and task redundant solution manifolds is beneficial while acting in uncertain environments, during motor development or when recovering from a neurological disorder to discover and learn new motor actions.
摘要:
从婴儿的胡言乱语到鸣鸟练习新曲调,探索对运动学习至关重要。探索的一个标志是沿着解流形的随机游走行为的出现,其中连续的运动动作不是独立的,而是串行依赖的。这种探索性随机游走行为在物种神经放电中无处不在,步态模式和到达行为。过去的工作表明,探索性随机游走行为源于运动变异性的积累和缺乏基于错误的校正。这里,我们测试了一个根本不同的想法,即基于强化的过程调节随机游走行为,以促进持续的运动探索,从而最大限度地取得成功。在三次人体接触实验中,我们操纵了视觉显示的目标和看不见的奖励区的大小,以及强化反馈的概率。我们的经验和建模结果巧妙地支持了这样一种观点,即通过利用运动变异性的知识来更新预期的到达目标,从而出现了探索性随机行走行为。这种机制导致了对解决方案流形的积极和持续的探索,目前被认为是由突出的理论被动地产生的。不断探索肌肉的能力,联合和任务冗余解决方案流形在不确定环境中行动时是有益的,在运动发育过程中或从神经系统疾病中恢复以发现和学习新的运动行为。
公众号