Optimal policy

  • 文章类型: Journal Article
    我们考虑与可再生能源发电集成的电动汽车(EV)的电池充电调度。电动汽车的日益采用和可再生能源的发展对这项研究具有重要意义。由于动作空间大,充电调度的优化具有挑战性,多阶段决策,高度的不确定性。当系统的规模较大时,解决该问题是耗时的。迫切需要开发一种实用有效的方法来正确安排电动汽车的充电。这项工作的贡献有三个方面。首先,我们提供了一个充分条件,在这个条件下,电动汽车的充电可以通过分布式发电完全自我维持。提出了一种在充分条件成立时获得最优计费策略的算法。第二,研究了可再生能源供应不足的情况。我们证明,当可再生发电是确定性的时,存在一个最佳策略,该策略遵循修改后的最小松弛度和更长的剩余处理时间优先(mLLLP)规则。第三,我们提供了一种基于规则的自适应算法,该算法在一般情况下有效地获得接近最优的计费策略。我们通过数值实验测试了所提出的算法。结果表明,它的性能优于其他现有的基于规则的方法。
    We consider the scheduling of battery charging of electric vehicles (EVs) integrated with renewable power generation. The increasing adoption of EVs and the development of renewable energies contribute importance to this research. The optimization of charging scheduling is challenging because of the large action space, the multi-stage decision making, and the high uncertainty. To solve this problem is time-consuming when the scale of the system is large. It is urgent to develop a practical and efficient method to properly schedule the charging of EVs. The contribution of this work is threefold. First, we provide a sufficient condition on which the charging of EVs can be completely self-sustained by distributed generation. An algorithm is proposed to obtain the optimal charging policy when the sufficient condition holds. Second, the scenario when the supply of the renewable power generation is deficient is investigated. We prove that when the renewable generation is deterministic there exists an optimal policy which follows the modified least laxity and longer remaining processing time first (mLLLP) rule. Third, we provide an adaptive rule-based algorithm which obtains a near-optimal charging policy efficiently in general situations. We test the proposed algorithm by numerical experiments. The results show that it performs better than the other existing rule-based methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    生态平衡和经济稳定发展对渔业至关重要。本研究提出了一种适用于海洋群落的捕食者-食饵系统,其中,捕食者的增长遵循Allee效应,并考虑了供需引起的资源价格的快速波动。该系统预测了灾难性均衡的存在,这可能会导致猎物的灭绝,从而导致捕食者的灭绝,但是捕鱼努力仍然很高。为了避免这种情况,在捕鱼区附近建立了海洋保护区。鱼类在这两个地区之间迅速迁移,仅在非保护区收获。通过应用变量聚合来描述全局变量在慢时间尺度上的变化,得出了三维简化模型。寻求条件以避免物种灭绝并维持可持续的捕鱼活动,基于简化模型,探讨了正平衡点的存在性及其局部稳定性。此外,研究了建立海洋保护区和根据单位渔获量征税对渔业动态的长期影响,并运用Pontryagin的最大值原理得到最优税收政策。本研究的理论分析和数值算例证明了提高海洋保护区比例和控制税收对渔业可持续发展的综合有效性。
    Ecological balance and stable economic development are crucial for the fishery. This study proposes a predator-prey system for marine communities, where the growth of predators follows the Allee effect and takes into account the rapid fluctuations in resource prices caused by supply and demand. The system predicts the existence of catastrophic equilibrium, which may lead to the extinction of prey, consequently leading to the extinction of predators, but fishing efforts remain high. Marine protected areas are established near fishing areas to avoid such situations. Fish migrate rapidly between these two areas and are only harvested in the nonprotected areas. A three-dimensional simplified model is derived by applying variable aggregation to describe the variation of global variables on a slow time scale. To seek conditions to avoid species extinction and maintain sustainable fishing activities, the existence of positive equilibrium points and their local stability are explored based on the simplified model. Moreover, the long-term impact of establishing marine protected areas and levying taxes based on unit catch on fishery dynamics is studied, and the optimal tax policy is obtained by applying Pontryagin\'s maximum principle. The theoretical analysis and numerical examples of this study demonstrate the comprehensive effectiveness of increasing the proportion of marine protected areas and controlling taxes on the sustainable development of fishery.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    扑克在人工智能和博弈论中都被认为是一个具有挑战性的问题,因为扑克的特点是信息不完善和不确定性,类似于拍卖等许多现实问题,定价,网络安全,和操作。然而,到目前为止,尚不清楚在多人游戏中发挥均衡政策是否明智,从理论上验证政策是否最优是不可行的。因此,设计一种有效的最优策略学习方法更具有现实意义。本文提出了一种基于Actor-Critic强化学习的多人扑克游戏最优策略学习方法。首先,本文构建了在信息不完善的情况下做出决策的行为者网络和在信息完善的情况下评估政策的批评网络。其次,本文提出了一种新颖的多玩家扑克策略更新方法:异步策略更新算法(APU)和双网异步策略更新算法(Dual-APU),适用于多玩家多策略场景和多玩家共享策略场景。分别。最后,本文以最流行的六人德州扑克为例,验证了所提出的最优策略学习方法的性能。实验表明,与现有方法相比,所提出的方法学习的策略表现良好,并且收益稳定。总之,基于Actor-Critic强化学习的不完全信息博弈的策略学习方法在扑克上表现良好,可以转化为其他不完全信息博弈。这种具有完美信息的培训和具有不完美信息模型的测试显示了一种有效且可解释的学习近似最佳策略的方法。
    Poker has been considered a challenging problem in both artificial intelligence and game theory because poker is characterized by imperfect information and uncertainty, which are similar to many realistic problems like auctioning, pricing, cyber security, and operations. However, it is not clear that playing an equilibrium policy in multi-player games would be wise so far, and it is infeasible to theoretically validate whether a policy is optimal. Therefore, designing an effective optimal policy learning method has more realistic significance. This paper proposes an optimal policy learning method for multi-player poker games based on Actor-Critic reinforcement learning. Firstly, this paper builds the Actor network to make decisions with imperfect information and the Critic network to evaluate policies with perfect information. Secondly, this paper proposes a novel multi-player poker policy update method: asynchronous policy update algorithm (APU) and dual-network asynchronous policy update algorithm (Dual-APU) for multi-player multi-policy scenarios and multi-player sharing-policy scenarios, respectively. Finally, this paper takes the most popular six-player Texas hold \'em poker to validate the performance of the proposed optimal policy learning method. The experiments demonstrate the policies learned by the proposed methods perform well and gain steadily compared with the existing approaches. In sum, the policy learning methods of imperfect information games based on Actor-Critic reinforcement learning perform well on poker and can be transferred to other imperfect information games. Such training with perfect information and testing with imperfect information models show an effective and explainable approach to learning an approximately optimal policy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Wireless sensors are becoming essential in machine-type communications and Internet of Things. As the key performance metrics, the spectral efficiency as well as the energy efficiency have been considered while determining the effectiveness of sensor networks. In this paper, we present several power-splitting solutions to maximize the average harvested energy under a rate constraint when both the information and power are transmitted through the same wireless channel to a sensor (i.e., a receiver). More specifically, we first designed the optimal dynamic power-splitting policy, which decides the optimal fractional power of the received signal used for energy harvesting at the receiver. As effective solutions, we proposed two types of single-threshold-based power-splitting policies, namely, Policies I and II, which decide to switch between energy harvesting and information decoding by comparing the received signal power with some given thresholds. Additionally, we performed asymptotic analysis for a large number of packets along with practical statistics-based policies. Consequently, we demonstrated the effectiveness of the proposed power-splitting solutions in terms of the rate-energy trade-off.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号