拖延是自愿但非理性地推迟一项任务,尽管意识到拖延会导致更糟糕的后果。它在心理学领域得到了广泛的研究,从促成因素来看,到理论模型。从基于价值的决策和强化学习(RL)的角度来看,拖延被认为是由认知限制导致的非最佳选择引起的。到底涉及什么样的认知限制,然而,仍然难以捉摸。在目前的研究中,我们检查了一种特定类型的认知限制,即,国家代表性不足导致估值不准确,会导致拖延。最近的工作表明,人类可以采用称为后继表示(SR)的特定类型的状态表示,并且人类可以学习通过相对低维的特征来表示状态。结合这些建议,我们假设了一个降维版本的SR。我们模拟了“学生”在学期期间做作业的一系列行为,当推迟执行任务时(即,拖延)是不允许的,在假期里,是否拖延可以自由选择。我们假设“学生”获得了每个州严格降低的SR,对应于完成作业的每个步骤,在没有拖延的政策下。“学生”学习了每个状态的近似值,该近似值是作为刚性简化SR中状态特征的线性函数计算的,通过时间差(TD)学习。假期期间,“学生”根据这些近似值在每个时间步做出是否拖延的决定。仿真结果表明,基于简化SR的RL模型产生了拖延行为,这种情况在不同的情节中恶化了。根据学生的近似值,“拖延是更好的选择,而根据真实值,不拖延大多更好。因此,当前模型产生的拖延行为是由不准确的值逼近引起的,这是由于采用减少的SR作为州代表。这些结果表明,降低的SR,或者更一般地,状态表示中的降维,可能是导致拖延的认知限制的潜在形式。
Procrastination is the voluntary but irrational postponing of a task despite being aware that the delay can lead to worse consequences. It has been extensively studied in psychological field, from contributing factors, to theoretical models. From value-based decision making and reinforcement learning (RL) perspective, procrastination has been suggested to be caused by non-optimal choice resulting from cognitive limitations. Exactly what sort of cognitive limitations are involved, however, remains elusive. In the current study, we examined if a particular type of cognitive limitation, namely, inaccurate valuation resulting from inadequate state representation, would cause procrastination. Recent work has suggested that humans may adopt a particular type of state representation called the successor representation (SR) and that humans can learn to represent states by relatively low-dimensional features. Combining these suggestions, we assumed a dimension-reduced version of SR. We modeled a series of behaviors of a \"student\" doing assignments during the school term, when putting off doing the assignments (i.e., procrastination) is not allowed, and during the vacation, when whether to procrastinate or not can be freely chosen. We assumed that the \"student\" had acquired a rigid reduced SR of each state, corresponding to each step in completing an assignment, under the policy without procrastination. The \"student\" learned the approximated value of each state which was computed as a linear function of features of the states in the rigid reduced SR, through temporal-difference (TD) learning. During the vacation, the \"student\" made decisions at each time-step whether to procrastinate based on these approximated values. Simulation results showed that the reduced SR-based RL model generated procrastination behavior, which worsened across episodes. According to the values approximated by the \"student,\" to procrastinate was the better choice, whereas not to procrastinate was mostly better according to the true values. Thus, the current model generated procrastination behavior caused by inaccurate value approximation, which resulted from the adoption of the reduced SR as state representation. These findings indicate that the reduced SR, or more generally, the dimension reduction in state representation, can be a potential form of cognitive limitation that leads to procrastination.