关键词: Partially Observable Markov Decision Processes (POMDPs) Soft Actor-Critic (SAC) autonomous driving deep reinforcement learning (DRL) multimodal state space variational autoencoder (VAE)

来  源:   DOI:10.3389/fnbot.2024.1338189   PDF(Pubmed)

Abstract:
In real-world scenarios, making navigation decisions for autonomous driving involves a sequential set of steps. These judgments are made based on partial observations of the environment, while the underlying model of the environment remains unknown. A prevalent method for resolving such issues is reinforcement learning, in which the agent acquires knowledge through a succession of rewards in addition to fragmentary and noisy observations. This study introduces an algorithm named deep reinforcement learning navigation via decision transformer (DRLNDT) to address the challenge of enhancing the decision-making capabilities of autonomous vehicles operating in partially observable urban environments. The DRLNDT framework is built around the Soft Actor-Critic (SAC) algorithm. DRLNDT utilizes Transformer neural networks to effectively model the temporal dependencies in observations and actions. This approach aids in mitigating judgment errors that may arise due to sensor noise or occlusion within a given state. The process of extracting latent vectors from high-quality images involves the utilization of a variational autoencoder (VAE). This technique effectively reduces the dimensionality of the state space, resulting in enhanced training efficiency. The multimodal state space consists of vector states, including velocity and position, which the vehicle\'s intrinsic sensors can readily obtain. Additionally, latent vectors derived from high-quality images are incorporated to facilitate the Agent\'s assessment of the present trajectory. Experiments demonstrate that DRLNDT may achieve a superior optimal policy without prior knowledge of the environment, detailed maps, or routing assistance, surpassing the baseline technique and other policy methods that lack historical data.
摘要:
在现实世界的场景中,为自动驾驶制定导航决策涉及一系列步骤。这些判断是基于对环境的部分观察而做出的,而环境的基础模型仍然未知。解决此类问题的一种流行方法是强化学习,其中,代理人除了零碎和嘈杂的观察外,还通过一系列奖励来获取知识。本研究引入了一种称为通过决策变压器(DRLNDT)的深度强化学习导航的算法,以解决在部分可观察的城市环境中运行的自动驾驶汽车的决策能力的挑战。DRLNDT框架是围绕软Actor-Critic(SAC)算法构建的。DRLNDT利用Transformer神经网络对观测和动作中的时间依赖性进行有效建模。该方法有助于减轻由于给定状态内的传感器噪声或阻塞而可能出现的判断错误。从高质量图像中提取潜在向量的过程涉及使用变分自动编码器(VAE)。该技术有效地降低了状态空间的维数,提高了培训效率。多模态状态空间由矢量状态组成,包括速度和位置,车辆的固有传感器可以很容易地获得。此外,结合从高质量图像中导出的潜在向量,以促进Agent对当前轨迹的评估。实验表明,DRLNDT可以在不事先了解环境的情况下实现优越的最优策略,详细的地图,或路由帮助,超越基线技术和其他缺乏历史数据的政策方法。
公众号