基于改进深度强化学习的移动机器人路径规划
DOI:
作者:
作者单位:

1. 沈阳化工大学计算机科学与技术学院, 沈阳 110142; 2. 辽宁省化工过程工业智能化技术重点实验室, 沈阳 110142

作者简介:

通讯作者:

中图分类号:

TP242

基金项目:


Mobile Robot Path Planning Based on Improved Deep Reinforcement Learning
Author:
Affiliation:

1. School of computer science and technology, Shenyang University of Chemical Technology, Shenyang 110142, China; 2. Liaoning Key Laboratory of intelligent technology for chemical process industry, Shenyang 110142, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对传统深度强化学习中移动机器人在稀疏奖励环境下只有在规定时间步内到达目标位置才能得到积极奖励,中间过程的每一步都是负面奖励的路径规划问题。提出了基于改进深度Q网络的路径规划方法,在移动机器人在探索过程中,对以真实目标为条件的轨迹进行采样,在经验回放过程中,把移动机器人已经到达的状态来代替真正的目标,这样移动机器人可以获得足够的积极奖励信号来开始学习。通过深度卷积神经网络模型,将原始RGB图像作为输入,通过端对端的方法训练,利用置信区间上界探索策略和小批量样本的方法训练神经网络参数,最后得到上、下、左、右四个动作的Q值。在相同的仿真环境中结果表明,该算法提升了采样效率,训练迭代更快,并且更容易收敛,避开障碍物到达终点的成功率增加40%左右,一定程度上解决了稀疏奖励带来的问题。

    Abstract:

    In the traditional deep reinforcement learning, the mobile robot can get a positive reward only when it reaches the target position within the specified time step in the sparse reward environment. Each step of the intermediate process is a path planning problem with negative reward. A path planning method based on improved depth Q network is proposed. In the process of exploration, the mobile robot samples the trajectory conditional on the real target, and replaces the real target with the state that the mobile robot has reached in the process of experience playback, so that the mobile robot can obtain enough positive reward signals to start learning. Through the deep convolution neural network model, the original RGB image is used as the input, trained through the end-to-end method, trained the neural network parameters by using the upper bound exploration strategy of confidence interval and the method of small batch samples, and finally obtained the Q values of up, down, left and right actions. In the same simulation environment, the results show that the algorithm improves the sampling efficiency, the training iteration is faster, and it is easier to converge. The success rate of avoiding obstacles to reach the end point increases by about 40%, which solves the problem caused by sparse reward to a certain extent.

    参考文献
    相似文献
    引证文献
引用本文

王军,杨云霄,李莉.基于改进深度强化学习的移动机器人路径规划[J].电子测量技术,2021,44(22):19-24

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-07-04
  • 出版日期: