Reinforcement Learning

强化学习
  • 文章类型: Journal Article
    在当代数字化格局和技术进步中,拍卖业经历了一场蜕变,承担起交易范式的关键作用。作为商品或服务定价的机制,拍卖的程序复杂性和效率直接影响市场动态和参与者参与。利用人工智能(AI)技术的先进能力,拍卖部门主动整合人工智能方法,以增强效率并丰富用户互动。本研究深入研究了拍卖领域内价格预测挑战的复杂性,引入复杂的RL-GRU框架进行价格区间分析。该框架首先通过GRU对商品进行定量特征提取,随后通过强化学习技术在模型环境中编排动态交互。最终,通过独具慧眼的分类模块,完成区间划分和拍卖商品价格识别的任务。在五个时间间隔内,在公开可用和内部策划的数据集中展示超过90%的精度,并在八个时间间隔内展示卓越的性能。该框架为未来拍卖价格区间预测挑战的努力提供了宝贵的技术见解。
    In the contemporary digitalization landscape and technological advancement, the auction industry undergoes a metamorphosis, assuming a pivotal role as a transactional paradigm. Functioning as a mechanism for pricing commodities or services, the procedural intricacies and efficiency of auctions directly influence market dynamics and participant engagement. Harnessing the advancing capabilities of artificial intelligence (AI) technology, the auction sector proactively integrates AI methodologies to augment efficacy and enrich user interactions. This study delves into the intricacies of the price prediction challenge within the auction domain, introducing a sophisticated RL-GRU framework for price interval analysis. The framework commences by adeptly conducting quantitative feature extraction of commodities through GRU, subsequently orchestrating dynamic interactions within the model\'s environment via reinforcement learning techniques. Ultimately, it accomplishes the task of interval division and recognition of auction commodity prices through a discerning classification module. Demonstrating precision exceeding 90% across publicly available and internally curated datasets within five intervals and exhibiting superior performance within eight intervals, this framework contributes valuable technical insights for future endeavours in auction price interval prediction challenges.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    高血压是许多严重疾病的主要危险因素。随着人口老龄化和生活方式的改变,高血压的发病率持续上升,给患者带来巨大的医疗费用负担,严重影响他们的生活质量。早期干预可以大大降低高血压的患病率。基于电子健康档案(EHRs)的高血压预警模型研究是实现高血压预警的重要而有效的方法。然而,受限于多次访问记录的稀缺性和不平衡,和高血压的非平稳特征,很难有效预测患者的高血压患病率。因此,本研究提出了一种基于强化学习和生成特征重放的高血压在线监测模型(HRP-OG)。它将高血压预测问题转化为顺序决策问题,使用多次就诊记录实现患者高血压风险预测。嵌入医疗设备和可穿戴设备中的传感器可持续捕获血压等实时生理数据,心率,和活动水平,它们被集成到EHR中。生成器生成的样本与真实访问数据之间的拟合使用最大似然估计进行评估,这可以减少高血压特征空间与输入增量数据之间的对抗性差异,并且使用生成特征回放基于实时数据在线更新模型。传感器数据的合并确保模型动态适应患者状况的变化,促进及时干预。在这项研究中,公开可用的MIMIC-III数据用于验证,实验结果表明,与现有的先进方法相比,HRP-OG可以有效提高非平稳环境中少发多次就诊记录的高血压风险预测的准确性。
    Hypertension is a major risk factor for many serious diseases. With the aging population and lifestyle changes, the incidence of hypertension continues to rise, imposing a significant medical cost burden on patients and severely affecting their quality of life. Early intervention can greatly reduce the prevalence of hypertension. Research on hypertension early warning models based on electronic health records (EHRs) is an important and effective method for achieving early hypertension warning. However, limited by the scarcity and imbalance of multivisit records, and the nonstationary characteristics of hypertension features, it is difficult to predict the probability of hypertension prevalence in a patient effectively. Therefore, this study proposes an online hypertension monitoring model (HRP-OG) based on reinforcement learning and generative feature replay. It transforms the hypertension prediction problem into a sequential decision problem, achieving risk prediction of hypertension for patients using multivisit records. Sensors embedded in medical devices and wearables continuously capture real-time physiological data such as blood pressure, heart rate, and activity levels, which are integrated into the EHR. The fit between the samples generated by the generator and the real visit data is evaluated using maximum likelihood estimation, which can reduce the adversarial discrepancy between the feature space of hypertension and incoming incremental data, and the model is updated online based on real-time data using generative feature replay. The incorporation of sensor data ensures that the model adapts dynamically to changes in the condition of patients, facilitating timely interventions. In this study, the publicly available MIMIC-III data are used for validation, and the experimental results demonstrate that compared to existing advanced methods, HRP-OG can effectively improve the accuracy of hypertension risk prediction for few-shot multivisit record in nonstationary environments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在复杂场景中做出明智决策的能力对于智能汽车系统至关重要。传统的专家规则和其他方法通常在复杂的环境中不足。最近,强化学习由于其优越的决策能力而受到了广泛的关注。然而,存在目标网络估计不准确的现象,这限制了其在复杂场景中的决策能力。本文主要研究低估现象,提出了一种基于改进TD3算法的端到端自主驾驶决策方法。该方法采用前向摄像机来捕获数据。通过引入新的批评网络,形成三重批评结构,并将其与目标最大化操作相结合,解决了TD3算法中的低估问题。随后,多时间步长平均法用于解决新的单一评论家造成的政策不稳定性。此外,本文利用Carla平台构建了多车辆无保护左转弯和拥堵车道中心驾驶场景,并对算法进行了验证。结果表明,该方法在收敛速度、收敛速度等方面均优于基准DDPG和TD3算法,估计精度,政策稳定。
    The ability to make informed decisions in complex scenarios is crucial for intelligent automotive systems. Traditional expert rules and other methods often fall short in complex contexts. Recently, reinforcement learning has garnered significant attention due to its superior decision-making capabilities. However, there exists the phenomenon of inaccurate target network estimation, which limits its decision-making ability in complex scenarios. This paper mainly focuses on the study of the underestimation phenomenon, and proposes an end-to-end autonomous driving decision-making method based on an improved TD3 algorithm. This method employs a forward camera to capture data. By introducing a new critic network to form a triple-critic structure and combining it with the target maximization operation, the underestimation problem in the TD3 algorithm is solved. Subsequently, the multi-timestep averaging method is used to address the policy instability caused by the new single critic. In addition, this paper uses Carla platform to construct multi-vehicle unprotected left turn and congested lane-center driving scenarios and verifies the algorithm. The results demonstrate that our method surpasses baseline DDPG and TD3 algorithms in aspects such as convergence speed, estimation accuracy, and policy stability.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    复杂网络中的同步是一个普遍存在的重要现象,涉及各个领域。过度同步可能导致不希望的后果,使去同步技术必不可少。利用近端策略优化算法,这项工作研究了基于强化学习的固定控制策略,用于全局耦合网络和两种类型的不规则耦合网络中的同步抑制:Watts-Strogatz小世界网络和Barabási-Albert无标度网络。我们研究了LeaderRank算法选择的受控节点比率和关键节点的作用对同步抑制性能的影响。数值结果表明,基于强化学习的钉扎控制策略在复杂网络的不同耦合方案中的有效性。揭示了钉扎节点的临界比率和新提出的混合钉扎策略的优越性能。该结果为有效抑制和优化网络同步行为提供了有价值的见解。
    Synchronization in complex networks is a ubiquitous and important phenomenon with implications in various fields. Excessive synchronization may lead to undesired consequences, making desynchronization techniques essential. Exploiting the Proximal Policy Optimization algorithm, this work studies reinforcement learning-based pinning control strategies for synchronization suppression in global coupling networks and two types of irregular coupling networks: the Watts-Strogatz small-world networks and the Barabási-Albert scale-free networks. We investigate the impact of the ratio of controlled nodes and the role of key nodes selected by the LeaderRank algorithm on the performance of synchronization suppression. Numerical results demonstrate the effectiveness of the reinforcement learning-based pinning control strategy in different coupling schemes of the complex networks, revealing a critical ratio of the pinned nodes and the superior performance of a newly proposed hybrid pinning strategy. The results provide valuable insights for suppressing and optimizing network synchronization behavior efficiently.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    具有分散执行的集中式训练(CTDE)是完全合作的多主体强化学习(MARL)领域的普遍范例。现有算法经常会遇到两大问题:独立策略往往低估行动的潜在价值,导致次优纳什均衡(NE)的收敛;一些通信范式给学习过程带来了额外的复杂性,使对消息基本要素的关注复杂化。为了应对这些挑战,我们提出了一种新颖的方法,称为带有动机通信的乐观顺序软演员评论家(OSSMC)。OSSMC的关键思想是利用贪婪驱动的方法来探索个人政策的潜在价值,命名为乐观Q值,作为当前策略的Q值的上限。然后,我们为代理集成了一个具有乐观Q值的顺序更新机制,旨在确保联合政策优化过程中的单调改进。此外,我们为每个智能体建立动机交流模块,传播动机信息,促进合作行为。最后,我们采用软演员批评(SAC)方法的价值正则化策略来最大化熵并提高勘探能力。根据一系列具有挑战性的基准集,对OSSMC的性能进行了严格评估。经验结果表明,OSSMC不仅超越了当前的基线算法,而且表现出更快的收敛速度。
    Centralized Training with Decentralized Execution (CTDE) is a prevalent paradigm in the field of fully cooperative Multi-Agent Reinforcement Learning (MARL). Existing algorithms often encounter two major problems: independent strategies tend to underestimate the potential value of actions, leading to the convergence on sub-optimal Nash Equilibria (NE); some communication paradigms introduce added complexity to the learning process, complicating the focus on the essential elements of the messages. To address these challenges, we propose a novel method called Optimistic Sequential Soft Actor Critic with Motivational Communication (OSSMC). The key idea of OSSMC is to utilize a greedy-driven approach to explore the potential value of individual policies, named optimistic Q-values, which serve as an upper bound for the Q-value of the current policy. We then integrate a sequential update mechanism with optimistic Q-value for agents, aiming to ensure monotonic improvement in the joint policy optimization process. Moreover, we establish motivational communication modules for each agent to disseminate motivational messages to promote cooperative behaviors. Finally, we employ a value regularization strategy from the Soft Actor Critic (SAC) method to maximize entropy and improve exploration capabilities. The performance of OSSMC was rigorously evaluated against a series of challenging benchmark sets. Empirical results demonstrate that OSSMC not only surpasses current baseline algorithms but also exhibits a more rapid convergence rate.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    微流体混合器,微流控技术的关键应用,主要用于微型设备中各种样品的快速合并。鉴于其设计过程的复杂性和设计师所需的大量专业知识,微流体混合器设计的智能自动化受到了极大的关注。本文讨论了一种将人工神经网络(ANN)与强化学习技术集成在一起的方法,以自动进行微流体混合器的尺寸参数设计。在这项研究中,我们选择了两个典型的微流体混合器结构进行测试和训练两个神经网络模型,既高度精确又具有成本效益,作为传统的替代品,使用多达10,000套COMSOL模拟数据进行耗时的有限元模拟。通过为强化学习代理定义有效的状态评估函数,我们利用经过训练的代理成功地验证了这些混合器结构的尺寸参数的自动化设计。测试表明,第一个混合器模型可以在0.129s内自动优化,第二个是0.169s,与手动设计相比,大大减少了时间。仿真结果验证了强化学习技术在微流体混合器自动化设计中的潜力,在这个领域提供了一个新的解决方案。
    Microfluidic mixers, a pivotal application of microfluidic technology, are primarily utilized for the rapid amalgamation of diverse samples within microscale devices. Given the intricacy of their design processes and the substantial expertise required from designers, the intelligent automation of microfluidic mixer design has garnered significant attention. This paper discusses an approach that integrates artificial neural networks (ANNs) with reinforcement learning techniques to automate the dimensional parameter design of microfluidic mixers. In this study, we selected two typical microfluidic mixer structures for testing and trained two neural network models, both highly precise and cost-efficient, as alternatives to traditional, time-consuming finite-element simulations using up to 10,000 sets of COMSOL simulation data. By defining effective state evaluation functions for the reinforcement learning agents, we utilized the trained agents to successfully validate the automated design of dimensional parameters for these mixer structures. The tests demonstrated that the first mixer model could be automatically optimized in just 0.129 s, and the second in 0.169 s, significantly reducing the time compared to manual design. The simulation results validated the potential of reinforcement learning techniques in the automated design of microfluidic mixers, offering a new solution in this field.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本文旨在解决放射性环境下的多目标作业规划问题。首先,构建了一个更复杂的辐射剂量模型,考虑每个操作点的难度。基于这个模型,将多目标运营计划问题转化为变量旅行商问题(VTSP)。第二,关于这个问题,一种新颖的组合算法框架,即超参数自适应遗传算法(HPAGA),将生物启发优化与强化学习相结合,被提议,这允许自适应调整遗传算法的超参数,以便有效地获得最优解。第三,比较研究表明,针对各种TSP实例,所提出的HPAGA与经典进化算法相比具有优越的性能。此外,模拟放射性环境中的一个案例研究暗示了HPAGA在未来的潜在应用。
    This paper aims to solve the multi-objective operating planning problem in the radioactive environment. First, a more complicated radiation dose model is constructed, considering difficulty levels at each operating point. Based on this model, the multi-objective operating planning problem is converted to a variant traveling salesman problem (VTSP). Second, with respect to this issue, a novel combinatorial algorithm framework, namely hyper-parameter adaptive genetic algorithm (HPAGA), integrating bio-inspired optimization with reinforcement learning, is proposed, which allows for adaptive adjustment of the hyperparameters of GA so as to obtain optimal solutions efficiently. Third, comparative studies demonstrate the superior performance of the proposed HPAGA against classical evolutionary algorithms for various TSP instances. Additionally, a case study in the simulated radioactive environment implies the potential application of HPAGA in the future.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:抑郁症的特征通常是奖励功能受损,并在强化学习中显示出改变的奖励动机。本研究进一步探讨了任务难度是否会影响有或没有抑郁症状的大学生的强化学习。
    方法:抑郁症状组(20)和无抑郁症状组(26)完成概率奖励学习任务,中等,和高难度,在该任务中,分析了对奖励的反应偏差和奖励的可识别性。此外,在执行简单的赌博任务时,记录并分析了对奖励和损失反馈的电生理反应。
    结果:当任务很容易时,抑郁症症状组比无抑郁症症状组表现出更多的对奖励的反应偏差,然后随着任务难度的增加,对奖励的反应偏差更快地降低。无抑郁症状组仅在高难度条件下才显示出反应偏差的减少。进一步的回归分析表明,反馈相关的负(FRN)和θ振荡可以预测低难度条件下的响应偏差变化,FRN和theta和delta的振荡可以预测中等和高难度条件下的响应偏差变化。
    结论:对损失和奖励的电生理反应没有记录在与强化学习行为相同的任务中。
    结论:有抑郁症状的大学生在强化学习过程中对任务难度更为敏感。FRN,θ和δ的振荡可以预测奖励倾斜行为。
    BACKGROUND: Depression is usually characterized by impairments in reward function, and shows altered motivation to reward in reinforcement learning. This study further explored whether task difficulty affects reinforcement learning in college students with and without depression symptom.
    METHODS: The depression symptom group (20) and the no depression symptom group (26) completed a probabilistic reward learning task with low, medium, and high difficulty levels, in which task the response bias to reward and the discriminability of reward were analyzed. Additionally, electrophysiological responses to reward and loss feedback were recorded and analyzed while they performed a simple gambling task.
    RESULTS: The depression symptom group showed more response bias to reward than the no depression symptom group when the task was easy and then exhibited more quickly decrease in response bias to reward as task difficulty increased. The no depression symptom group showed a decrease in response bias only in the high-difficulty condition. Further regression analyses showed that, the Feedback-related negativity (FRN) and theta oscillation could predict response bias change in the low-difficulty condition, the FRN and oscillations of theta and delta could predict response bias change in the medium and high-difficulty conditions.
    CONCLUSIONS: The electrophysiological responses to loss and reward were not recorded in the same task as the reinforcement learning behaviors.
    CONCLUSIONS: College students with depression symptom are more sensitive to task difficulty during reinforcement learning. The FRN, and oscillations of theta and delta could predict reward leaning behavior.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    驾驶员模型对于自动驾驶汽车(AV)的安全评估至关重要,因为它们具有参考模型的作用。具体来说,预计AV将达到至少与谨慎和称职的驾驶员模型相同的安全性能水平。为了使这种比较成为可能,仔细和称职的驾驶员模型的定量建模是必不可少的。因此,欧洲经委会条例编号157提出了两种驾驶员模型作为AVs的基准,能够对房室纵向行为进行安全评估。然而,这两种驾驶员模型无法应用于非跟车场景,限制他们在高速公路合并等场景中的应用。为此,我们提出了一个谨慎和胜任的驾驶员模型,用于高速公路合并(CCDM2)场景,使用可解释的基于强化学习的决策和安全约束控制。我们将我们的模型的安全驾驶能力与人类驾驶员在具有挑战性的合并场景中进行比较,并展示我们模型的“谨慎”和“胜任”特征,同时确保其可解释性。结果表明,该模型具有比人类驾驶员更好的安全性能来处理合并场景的能力。该模型对于合并场景中的AV安全评估具有重要价值,并有助于将来将参考驾驶员模型纳入AV安全法规中。
    Driver models are crucial for the safety assessment of autonomous vehicles (AVs) because of their role as reference models. Specifically, an AV is expected to achieve at least the same level of safety performance as a careful and competent driver model. To make this comparison possible, quantitative modeling of careful and competent driver models is essential. Thus, the UNECE Regulation No. 157 proposes two driver models as benchmarks for AVs, enabling safety assessment of AV longitudinal behaviors. However, these two driver models are unable to be applied in non-car-following scenarios, limiting their applications in scenarios such as highway merging. To this end, we propose a careful and competent driver model for highway merging (CCDM2) scenarios using interpretable reinforcement learning-based decision-making and safety constraint control. We compare our model\'s safe driving capabilities with human drivers in challenging merging scenarios and demonstrate the \"careful\" and \"competent\" characteristics of our model while ensuring its interpretability. The results indicate the model\'s capability to handle merging scenarios with even better safety performance than human drivers. This model is of great value for AV safety assessment in merging scenarios and contributes to future reference driver models to be included in AV safety regulations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    基于强化学习的超启发式算法(RL-HH)是优化领域的流行趋势。RL-HH结合了超启发式(HH)的全局搜索能力和强化学习(RL)的学习能力。这种协同作用允许代理动态调整自己的策略,导致解决方案的逐步优化。现有研究表明RL-HH在解决复杂现实问题方面的有效性。然而,对RL-HH领域的全面介绍和总结尚属空白。本研究回顾了目前存在的RL-HH,并提出了RL-HH的一般框架。本文将算法类型分为两类:基于价值的强化学习超启发式和基于策略的强化学习超启发式。对每个类别中的典型算法进行了总结和详细描述。最后,讨论了RL-HH现有研究的不足和未来的研究方向。
    The reinforcement learning based hyper-heuristics (RL-HH) is a popular trend in the field of optimization. RL-HH combines the global search ability of hyper-heuristics (HH) with the learning ability of reinforcement learning (RL). This synergy allows the agent to dynamically adjust its own strategy, leading to a gradual optimization of the solution. Existing researches have shown the effectiveness of RL-HH in solving complex real-world problems. However, a comprehensive introduction and summary of the RL-HH field is still blank. This research reviews currently existing RL-HHs and presents a general framework for RL-HHs. This article categorizes the type of algorithms into two categories: value-based reinforcement learning hyper-heuristics and policy-based reinforcement learning hyper-heuristics. Typical algorithms in each category are summarized and described in detail. Finally, the shortcomings in existing researches on RL-HH and future research directions are discussed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号