Reinforcement Learning

强化学习
  • 文章类型: Journal Article
    在这篇文章中,研究了具有间歇性约束(约束间歇性出现)的严格反馈结构中的多智能体系统的自适应最优一致性控制问题。更具体地说,通过设计新颖的开关式功能和改进的坐标变换,受约束状态转换为无约束状态,并且在不需要“可行性条件”的情况下解决了间歇性约束的问题。此外,使用复合学习算法和神经网络来构造标识符,提出了一种简化的基于标识符-行动者-批评家的强化学习策略,以在backstepping框架下获得近似最优控制器。同时,借助非线性动态表面控制技术,消除了后退中的“复杂性爆炸”问题,并放松了对过滤器参数的要求。基于Lyapunov稳定性理论,证明了闭环系统中的所有信号都是有界的。最后,仿真算例验证了所提方法的有效性。
    In this article, an adaptive optimal consensus control problem is studied for multiagent systems in the strict-feedback structure with intermittent constraints (the constraints appear intermittently). More specifically, by designing a novel switch-like function and an improved coordinate transformation, the constrained states are converted into unconstrained states, and the problem of intermittent constraints is resolved without requiring \"feasibility conditions\". In addition, using the composite learning algorithm and neural networks to construct the identifier, a simplified identifier-actor-critic-based reinforcement learning strategy is proposed to obtain the approximate optimal controller under the framework of backstepping. Meanwhile, with the aid of the nonlinear dynamic surface control technique, the issue of \"explosion of complexity\" in backstepping is removed, and the requirements for filter parameters are loosened. Based on Lyapunov stability theory, it is demonstrated that all signals in the closed-loop system are bounded. Finally, two simulation examples are used to verify the effectiveness of the proposed method.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    强化学习(RL)最近在医疗保健领域发现了许多应用,这要归功于它对临床决策的自然适应以及从观察数据中学习最佳决策的能力。在临床实践中采用基于RL的解决方案的关键挑战,然而,是将现有知识纳入学习合适的解决方案。现有的知识,例如医疗指南可以提高解决方案的安全性,在患者的短期和长期结果之间取得更好的平衡,并增加临床医生的信任和采纳。我们提出了一个框架,用于在RL中纳入医学指南中提供的知识。该框架包括用于执行安全约束的组件,以及根据这些指南改变学习信号以更好地平衡短期和长期结果的方法。我们通过使用临床建立的通气指南扩展现有的基于RL的机械通气(MV)方法来评估框架。政策外政策评估的结果表明,我们的方法有可能降低90天死亡率,同时确保肺保护性通气。该框架为在临床实践中实施RL提供了重要的垫脚石,并为进一步研究开辟了多种途径。
    Reinforcement Learning (RL) has recently found many applications in the healthcare domain thanks to its natural fit to clinical decision-making and ability to learn optimal decisions from observational data. A key challenge in adopting RL-based solution in clinical practice, however, is the inclusion of existing knowledge in learning a suitable solution. Existing knowledge from e.g. medical guidelines may improve the safety of solutions, produce a better balance between short- and long-term outcomes for patients and increase trust and adoption by clinicians. We present a framework for including knowledge available from medical guidelines in RL. The framework includes components for enforcing safety constraints and an approach that alters the learning signal to better balance short- and long-term outcomes based on these guidelines. We evaluate the framework by extending an existing RL-based mechanical ventilation (MV) approach with clinically established ventilation guidelines. Results from off-policy policy evaluation indicate that our approach has the potential to decrease 90-day mortality while ensuring lung protective ventilation. This framework provides an important stepping stone towards implementations of RL in clinical practice and opens up several avenues for further research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    本文针对多智能体系统开发了一种新颖的比例-积分-导数(PID)调整方法,具有增强的自学习能力,可实现所有智能体的最佳共识。与传统的基于模型和数据驱动的PID整定方法不同,所开发的PID自学习方法通过主动与未知环境交互来更新控制器参数,具有保证的共识和代理性能优化的结果。首先,提出了基于PID控制的多智能体系统一致性问题。然后,找到PID增益被转换为解决非零和博弈问题,因此,提出了一种具有仅批评结构的非策略Q学习算法来仅使用数据更新PID增益,没有代理人动态的知识。最后,仿真结果验证了该方法的有效性。
    This paper develops a novel Proportional-Integral-Derivative (PID) tuning method for multi-agent systems with a reinforced self-learning capability for achieving the optimal consensus of all agents. Unlike the traditional model-based and data-driven PID tuning methods, the developed PID self-learning method updates the controller parameters by actively interacting with unknown environment, with the outcomes of guaranteed consensus and performance optimization of agents. Firstly, the PID control-based consensus problem of multi-agent systems is formulated. Then, finding the PID gains is converted into solving a nonzero-sum game problem, thus an off-policy Q-learning algorithm with the critic-only structure is proposed to update the PID gains using only data, without the knowledge of dynamics of agents. Finally, simulations are given to verify the effectiveness of the proposed method.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号