Optimal policy

  • 文章类型: Journal Article
    避免身体接触被认为是减少病原体传播的最安全和最明智的策略之一。这种方法的另一面是,缺乏社交互动可能会对健康的其他方面产生负面影响,比如诱导免疫抑制性焦虑和抑郁或阻止与多种微生物的重要相互作用,这可能是训练我们的免疫系统或维持其正常活动水平所必需的。这些可能反过来对人群的感染易感性和严重疾病的发病率产生负面影响。我们建议,未来的大流行模型可能会受益于“SIR模型”:流行病学模型扩展到考虑影响免疫韧性的社会互动的好处。我们开发了一个SIR模型,并讨论了哪些具体干预措施可能更有效地平衡了最大限度地减少病原体传播和最大化其他依赖相互作用的健康益处之间的权衡。我们的SIR+模型反映了健康不仅仅是没有疾病,而是一种身体状态,精神和社会福祉也可以依赖于允许病原体传播的相同的社会关系,未来大流行的公共卫生干预措施的建模应考虑到这种多维度。
    Avoiding physical contact is regarded as one of the safest and most advisable strategies to follow to reduce pathogen spread. The flip side of this approach is that a lack of social interactions may negatively affect other dimensions of health, like induction of immunosuppressive anxiety and depression or preventing interactions of importance with a diversity of microbes, which may be necessary to train our immune system or to maintain its normal levels of activity. These may in turn negatively affect a population\'s susceptibility to infection and the incidence of severe disease. We suggest that future pandemic modelling may benefit from relying on \'SIR+ models\': epidemiological models extended to account for the benefits of social interactions that affect immune resilience. We develop an SIR+ model and discuss which specific interventions may be more effective in balancing the trade-off between minimizing pathogen spread and maximizing other interaction-dependent health benefits. Our SIR+ model reflects the idea that health is not just the mere absence of disease, but rather a state of physical, mental and social well-being that can also be dependent on the same social connections that allow pathogen spread, and the modelling of public health interventions for future pandemics should account for this multidimensionality.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    生态平衡和经济稳定发展对渔业至关重要。本研究提出了一种适用于海洋群落的捕食者-食饵系统,其中,捕食者的增长遵循Allee效应,并考虑了供需引起的资源价格的快速波动。该系统预测了灾难性均衡的存在,这可能会导致猎物的灭绝,从而导致捕食者的灭绝,但是捕鱼努力仍然很高。为了避免这种情况,在捕鱼区附近建立了海洋保护区。鱼类在这两个地区之间迅速迁移,仅在非保护区收获。通过应用变量聚合来描述全局变量在慢时间尺度上的变化,得出了三维简化模型。寻求条件以避免物种灭绝并维持可持续的捕鱼活动,基于简化模型,探讨了正平衡点的存在性及其局部稳定性。此外,研究了建立海洋保护区和根据单位渔获量征税对渔业动态的长期影响,并运用Pontryagin的最大值原理得到最优税收政策。本研究的理论分析和数值算例证明了提高海洋保护区比例和控制税收对渔业可持续发展的综合有效性。
    Ecological balance and stable economic development are crucial for the fishery. This study proposes a predator-prey system for marine communities, where the growth of predators follows the Allee effect and takes into account the rapid fluctuations in resource prices caused by supply and demand. The system predicts the existence of catastrophic equilibrium, which may lead to the extinction of prey, consequently leading to the extinction of predators, but fishing efforts remain high. Marine protected areas are established near fishing areas to avoid such situations. Fish migrate rapidly between these two areas and are only harvested in the nonprotected areas. A three-dimensional simplified model is derived by applying variable aggregation to describe the variation of global variables on a slow time scale. To seek conditions to avoid species extinction and maintain sustainable fishing activities, the existence of positive equilibrium points and their local stability are explored based on the simplified model. Moreover, the long-term impact of establishing marine protected areas and levying taxes based on unit catch on fishery dynamics is studied, and the optimal tax policy is obtained by applying Pontryagin\'s maximum principle. The theoretical analysis and numerical examples of this study demonstrate the comprehensive effectiveness of increasing the proportion of marine protected areas and controlling taxes on the sustainable development of fishery.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    人们越来越有兴趣从“一刀切(OSFA)”的方法转向分层治疗决策。了解预期有效性和成本效益如何随患者协变量而变化是分层决策的关键方面。最近提出的机器学习(ML)方法可以学习结果的异质性,而无需预先指定子组或功能形式。能够构建将单个协变量映射到治疗决策中的决策规则(“策略”)。然而,这些方法尚未将ML估计值整合到决策建模框架中,以反映与政策相关的长期结果并综合来自多个来源的信息.在本文中,我们提出了一种集成ML和决策建模的方法,当个体患者数据可用于估计特定治疗的生存时间时。我们还提出了一种新颖的策略树算法实现,以使用决策模型输出定义子组。我们使用SPRINT(收缩压干预试验)演示了这些方法,比较“标准”和“强化”血压目标的结果。我们发现,将ML纳入决策模型可能会影响OSFA政策的净健康收益(INHB)增量估计。我们还发现证据表明,使用基于树的算法定义的亚组进行分层治疗可以增加INHB的估计值。
    There is increasing interest in moving away from \"one size fits all (OSFA)\" approaches toward stratifying treatment decisions. Understanding how expected effectiveness and cost-effectiveness varies with patient covariates is a key aspect of stratified decision making. Recently proposed machine learning (ML) methods can learn heterogeneity in outcomes without pre-specifying subgroups or functional forms, enabling the construction of decision rules (\'policies\') that map individual covariates into a treatment decision. However, these methods do not yet integrate ML estimates into a decision modeling framework in order to reflect long-term policy-relevant outcomes and synthesize information from multiple sources. In this paper, we propose a method to integrate ML and decision modeling, when individual patient data is available to estimate treatment-specific survival time. We also propose a novel implementation of policy tree algorithms to define subgroups using decision model output. We demonstrate these methods using the SPRINT (Systolic Blood Pressure Intervention Trial), comparing outcomes for \"standard\" and \"intensive\" blood pressure targets. We find that including ML into a decision model can impact the estimate of incremental net health benefit (INHB) for OSFA policies. We also find evidence that stratifying treatment using subgroups defined by a tree-based algorithm can increase the estimates of the INHB.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    大规模的公共隔离,俗称锁定,是一种检查疾病传播的非药物干预措施。本文介绍了ESOP(流行病学和社会经济最优政策),一种使用贝叶斯优化的主动机器学习技术的新应用,与流行病学模型相互作用,以达成锁定时间表,以最佳地平衡公共健康利益和锁定期间经济活动减少的社会经济弊端。使用VIPER(病毒-个人-政策-环境)的案例研究证明了ESOP的实用性,本文还提出了一种基于随机代理的模拟器。然而,ESOP足够灵活,可以以黑箱方式与任意流行病学模拟器进行交互,并产生涉及锁定的多个阶段的时间表。
    Mass public quarantining, colloquially known as a lock-down, is a non-pharmaceutical intervention to check spread of disease. This paper presents ESOP (Epidemiologically and Socio-economically Optimal Policies), a novel application of active machine learning techniques using Bayesian optimization, that interacts with an epidemiological model to arrive at lock-down schedules that optimally balance public health benefits and socio-economic downsides of reduced economic activity during lock-down periods. The utility of ESOP is demonstrated using case studies with VIPER (Virus-Individual-Policy-EnviRonment), a stochastic agent-based simulator that this paper also proposes. However, ESOP is flexible enough to interact with arbitrary epidemiological simulators in a black-box manner, and produce schedules that involve multiple phases of lock-downs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们分析了疾病遏制政策在随机经济流行病学框架中以治疗形式的作用,在该框架中,随机冲击发生的概率与状态有关,即与疾病的流行程度有关。随机冲击与新的疾病菌株的扩散有关,该菌株会影响感染物的数量和感染的增长率,这种冲击实现的可能性可能在感染的数量上增加或减少。我们确定了这样一个随机框架的最优策略和稳态,其特点是在严格的积极流行水平上支持不变的措施,这表明,完全根除从来都不是一个可能的长期结果,而地方性将占上风。我们的结果表明:(I)独立于状态相关概率的特征,治疗允许向左移动不变测量的支持;和(ii)状态相关概率的特征影响的形状和分布的疾病患病率在其支持,允许稳态结局,其特征是分布高度集中在低患病率水平上,或者更分散在更大范围的患病率(可能更高)水平上。
    We analyze the role of disease containment policy in the form of treatment in a stochastic economic-epidemiological framework in which the probability of the occurrence of random shocks is state-dependent, namely it is related to the level of disease prevalence. Random shocks are associated with the diffusion of a new strain of the disease which affects both the number of infectives and the growth rate of infection, and the probability of such shocks realization may be either increasing or decreasing in the number of infectives. We determine the optimal policy and the steady state of such a stochastic framework, which is characterized by an invariant measure supported on strictly positive prevalence levels, suggesting that complete eradication is never a possible long run outcome where instead endemicity will prevail. Our results show that: (i) independently of the features of the state-dependent probabilities, treatment allows to shift leftward the support of the invariant measure; and (ii) the features of the state-dependent probabilities affect the shape and spread of the distribution of disease prevalence over its support, allowing for a steady state outcome characterized by a distribution alternatively highly concentrated over low prevalence levels or more spread out over a larger range of prevalence (possibly higher) levels.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本文为理解最佳封锁提供了一个框架,并做出了三点贡献。首先,它从理论上分析了封锁政策,并认为决策者系统地制定了过于严格的封锁,因为他们的激励措施与实现预期目标不一致,而且他们无法适应不断变化的环境。第二,它为确定不同地区的政策制定者应对COVID-19的反应提供了一个基准。最后,它提供了一个框架来理解,when,以及为什么预计封锁政策会改变。
    This paper provides a framework for understanding optimal lockdowns and makes three contributions. First, it theoretically analyzes lockdown policies and argues that policy makers systematically enact too strict lockdowns because their incentives are misaligned with achieving desired ends and they cannot adapt to changing circumstances. Second, it provides a benchmark to determine how strongly policy makers in different locations should respond to COVID-19. Finally, it provides a framework for understanding how, when, and why lockdown policy is expected to change.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    阿尔茨海默病(AD)被认为是最常见的痴呆类型。尽管AD筛查已经被广泛讨论,在任何国家,都没有作为政策的一部分实施筛查计划。当前的医学研究促使在建模计划中关注疾病的临床前阶段。我们开发了部分可观察的马尔可夫决策过程模型,以确定最佳筛选程序。该模型包含无病和临床前AD部分可观察到的状态,并且当个体处于这些状态之一时进行筛选决定。可观察到的诊断的临床前AD状态与可观察到的轻度认知障碍一起整合,AD和死亡状态。使用Knight阿尔茨海默病研究中心(KADRC)的数据和相关文献估计状态之间的转移概率。以最大化预期总质量调整寿命年(QALYs)为目标,该模型的输出是一个最佳筛查程序,该程序指定在什么时间点将引导具有给定AD风险的50岁以上个体进行筛查.用于诊断临床前AD的筛查试验具有阳性的无效性,不完美,其敏感性和特异性使用KADRC数据集进行估计。我们研究了具有参数化有效性和无效性的潜在干预对三种不同风险状况(低,中等和高)。当干预效果和无效性时,最佳筛查政策是每年在50岁至95岁之间进行筛查,整体QALY增益为0.94、1.9和2.9,中高风险概况,分别。随着干预有效性的降低和/或其无效性的增加,最佳政策改为零星筛查,然后改为永不筛查。在几种情况下,从QALY的角度来看,在时间范围内进行一些筛查是最佳的。此外,对成本的深入分析表明,实施这些政策要么节省成本,要么具有成本效益。
    Alzheimer\'s Disease (AD) is believed to be the most common type of dementia. Even though screening for AD has been discussed widely, there is no screening program implemented as part of a policy in any country. Current medical research motivates focusing on the preclinical stages of the disease in a modeling initiative. We develop a partially observable Markov decision process model to determine optimal screening programs. The model contains disease free and preclinical AD partially observable states and the screening decision is taken while an individual is in one of those states. An observable diagnosed preclinical AD state is integrated along with observable mild cognitive impairment, AD and death states. Transition probabilities among states are estimated using data from Knight Alzheimer\'s Disease Research Center (KADRC) and relevant literature. With an objective of maximizing expected total quality-adjusted life years (QALYs), the output of the model is an optimal screening program that specifies at what points in time an individual over 50 years of age with a given risk of AD will be directed to undergo screening. The screening test used to diagnose preclinical AD has a positive disutility, is imperfect and its sensitivity and specificity are estimated using the KADRC data set. We study the impact of a potential intervention with a parameterized effectiveness and disutility on model outcomes for three different risk profiles (low, medium and high). When intervention effectiveness and disutility are at their best, the optimal screening policy is to screen every year between ages 50 and 95, with an overall QALY gain of 0.94, 1.9 and 2.9 for low, medium and high risk profiles, respectively. As intervention effectiveness diminishes and/or its disutility increases, the optimal policy changes to sporadic screening and then to never screening. Under several scenarios, some screening within the time horizon is optimal from a QALY perspective. Moreover, an in-depth analysis of costs reveals that implementing these policies are either cost-saving or cost-effective.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    扑克在人工智能和博弈论中都被认为是一个具有挑战性的问题,因为扑克的特点是信息不完善和不确定性,类似于拍卖等许多现实问题,定价,网络安全,和操作。然而,到目前为止,尚不清楚在多人游戏中发挥均衡政策是否明智,从理论上验证政策是否最优是不可行的。因此,设计一种有效的最优策略学习方法更具有现实意义。本文提出了一种基于Actor-Critic强化学习的多人扑克游戏最优策略学习方法。首先,本文构建了在信息不完善的情况下做出决策的行为者网络和在信息完善的情况下评估政策的批评网络。其次,本文提出了一种新颖的多玩家扑克策略更新方法:异步策略更新算法(APU)和双网异步策略更新算法(Dual-APU),适用于多玩家多策略场景和多玩家共享策略场景。分别。最后,本文以最流行的六人德州扑克为例,验证了所提出的最优策略学习方法的性能。实验表明,与现有方法相比,所提出的方法学习的策略表现良好,并且收益稳定。总之,基于Actor-Critic强化学习的不完全信息博弈的策略学习方法在扑克上表现良好,可以转化为其他不完全信息博弈。这种具有完美信息的培训和具有不完美信息模型的测试显示了一种有效且可解释的学习近似最佳策略的方法。
    Poker has been considered a challenging problem in both artificial intelligence and game theory because poker is characterized by imperfect information and uncertainty, which are similar to many realistic problems like auctioning, pricing, cyber security, and operations. However, it is not clear that playing an equilibrium policy in multi-player games would be wise so far, and it is infeasible to theoretically validate whether a policy is optimal. Therefore, designing an effective optimal policy learning method has more realistic significance. This paper proposes an optimal policy learning method for multi-player poker games based on Actor-Critic reinforcement learning. Firstly, this paper builds the Actor network to make decisions with imperfect information and the Critic network to evaluate policies with perfect information. Secondly, this paper proposes a novel multi-player poker policy update method: asynchronous policy update algorithm (APU) and dual-network asynchronous policy update algorithm (Dual-APU) for multi-player multi-policy scenarios and multi-player sharing-policy scenarios, respectively. Finally, this paper takes the most popular six-player Texas hold \'em poker to validate the performance of the proposed optimal policy learning method. The experiments demonstrate the policies learned by the proposed methods perform well and gain steadily compared with the existing approaches. In sum, the policy learning methods of imperfect information games based on Actor-Critic reinforcement learning perform well on poker and can be transferred to other imperfect information games. Such training with perfect information and testing with imperfect information models show an effective and explainable approach to learning an approximately optimal policy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    我们研究了失业保险对经济冲击的最优反应,有和没有承诺。具有承诺的最佳策略遵循修改后的Baily-Chetty公式,该公式考虑了对未来UI福利变化的求职响应。因此,具有承诺的最优策略倾向于前端加载UI,与最优自由裁量政策不同。为了应对旨在模仿诱发COVID-19衰退的冲击,我们发现,用户界面的大幅和短暂增加是最优的;政策规则取决于失业率的变化,而不是它的水平,是最优策略的一个很好的近似。
    We investigate the optimal response of unemployment insurance to economic shocks, both with and without commitment. The optimal policy with commitment follows a modified Baily-Chetty formula that accounts for job search responses to future UI benefit changes. As a result, the optimal policy with commitment tends to front-load UI, unlike the optimal discretionary policy. In response to shocks intended to mimic those that induced the COVID-19 recession, we find that a large and transitory increase in UI is optimal; and that a policy rule contingent on the change in unemployment, rather than its level, is a good approximation to the optimal policy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Wireless sensors are becoming essential in machine-type communications and Internet of Things. As the key performance metrics, the spectral efficiency as well as the energy efficiency have been considered while determining the effectiveness of sensor networks. In this paper, we present several power-splitting solutions to maximize the average harvested energy under a rate constraint when both the information and power are transmitted through the same wireless channel to a sensor (i.e., a receiver). More specifically, we first designed the optimal dynamic power-splitting policy, which decides the optimal fractional power of the received signal used for energy harvesting at the receiver. As effective solutions, we proposed two types of single-threshold-based power-splitting policies, namely, Policies I and II, which decide to switch between energy harvesting and information decoding by comparing the received signal power with some given thresholds. Additionally, we performed asymptotic analysis for a large number of packets along with practical statistics-based policies. Consequently, we demonstrated the effectiveness of the proposed power-splitting solutions in terms of the rate-energy trade-off.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号