关键词: Decision making Deep learning Machine learning Reinforcement learning Stochastic Calculus-guided Reinforcement Learning Stochastic calculus

来  源:   DOI:10.1016/j.mex.2024.102790   PDF(Pubmed)

Abstract:
Stochastic Calculus-guided Reinforcement learning (SCRL) is a new way to make decisions in situations where things are uncertain. It uses mathematical principles to make better choices and improve decision-making in complex situations. SCRL works better than traditional Stochastic Reinforcement Learning (SRL) methods. In tests, SCRL showed that it can adapt and perform well. It was better than the SRL methods. SCRL had a lower dispersion value of 63.49 compared to SRL\'s 65.96. This means SCRL had less variation in its results. SCRL also had lower risks than SRL in the short- and long-term. SCRL\'s short-term risk value was 0.64, and its long-term risk value was 0.78. SRL\'s short-term risk value was much higher at 18.64, and its long-term risk value was 10.41. Lower risk values are better because they mean less chance of something going wrong. Overall, SCRL is a better way to make decisions when things are uncertain. It uses math to make smarter choices and has less risk than other methods. Also, different metrics, viz training rewards, learning progress, and rolling averages between SRL and SCRL, were assessed, and the study found that SCRL outperforms well compared to SRL. This makes SCRL very useful for real-world situations where decisions must be made carefully.•By leveraging mathematical principles derived from stochastic calculus, SCRL offers a robust framework for making informed choices and enhancing performance in complex scenarios.•In comparison to traditional SRL methods, SCRL demonstrates superior adaptability and efficacy, as evidenced by empirical tests.
摘要:
随机微积分引导的强化学习(SCRL)是一种在事物不确定的情况下做出决策的新方法。它使用数学原理来做出更好的选择并改善复杂情况下的决策。SCRL比传统的随机强化学习(SRL)方法更好。在测试中,SCRL表明它可以适应并表现良好。它比SRL方法更好。与SRL的65.96相比,SCRL的色散值较低,为63.49。这意味着SCRL的结果变化较小。从短期和长期来看,SCRL的风险也低于SRL。SCRL的短期风险值为0.64,长期风险值为0.78。SRL的短期风险值远高于18.64,长期风险值为10.41。风险值越低越好,因为它们意味着出错的可能性越小。总的来说,当事情不确定时,SCRL是一种更好的决策方式。它使用数学来做出更明智的选择,并且比其他方法风险更小。此外,不同的指标,即培训奖励,学习进步,以及SRL和SCRL之间的滚动平均值,被评估,研究发现,与SRL相比,SCRL的表现优于SRL。这使得SCRL对于必须仔细做出决策的现实世界情况非常有用。·通过利用从随机演算中得出的数学原理,SCRL提供了一个强大的框架,可以在复杂的场景中做出明智的选择并增强性能。•与传统的SRL方法相比,SCRL表现出优越的适应性和功效,实证检验证明了这一点。
公众号