深度强化学习在机器人路径规划中的应用
DOI:
CSTR:
作者:
作者单位:

1.山东科技大学电子信息工程学院 青岛 266590; 2.山东科技大学计算机科学与工程学院 青岛 266590

作者简介:

通讯作者:

中图分类号:

TP242

基金项目:

山东省自然基金联合基金(ZR2019LZH001)项目资助


Application of deep reinforcement learning in robot path planning
Author:
Affiliation:

1.College of Electronic and Information Engineering, Shandong University of Science and Technology, Qingdao 266590, China; 2.College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对深度强化学习算法在路径规划的过程中出现与所处环境交互信息不精确、回馈稀疏、收敛不稳定等问题,在竞争网络结构的基础上,提出一种基于自调节贪婪策略与奖励设计的竞争深度Q网络算法。智能体在探索环境时,采用基于自调节贪婪因子的ε-greedy探索方法,由学习算法的收敛程度决定探索率ε的大小,从而合理分配探索与利用的概率。根据人工势场法物理理论塑造一种势场奖励函数,在目标处设置较大的引力势场奖励值,在障碍物附近设置斥力势场奖励值,使智能体能够更快的到达终点。在二维网格环境中进行仿真实验,仿真结果表明,该算法在不同规模地图下都取得了更高的平均奖赏值和更稳定的收敛效果,路径规划成功率提高了48.04%,验证了算法在路径规划方面的有效性和鲁棒性。同时与Q-learning算法对比实验表明,所提算法路径规划成功率提高了28.14%,具有更好的环境探索和路径规划能力。

    Abstract:

    To address the problems of inaccurate interaction information with the environment, sparse feedback, and unstable convergence of deep reinforcement learning algorithms in path planning, a dueling deep Q-network algorithm based on adaptive ε-greedy strategy and reward design is proposed. When exploring the environment, the agent uses the ε-greedy strategy with self-adjusting greedy factor, while the exploration rate ε is determined by the convergence degree of the learning algorithm, so that the probability of exploration and exploitation can be reasonably assigned. According to the physical theory of artificial potential field method, a potential field reward function is created which contains a larger gravitational potential field in the target, a repulsive potential field reward near the obstacle, making the agent to reach the end point faster. Simulation experiments are conducted in a 2D grid environment, the results show that the algorithm achieves higher average reward and more stable convergence under different scale maps, with an improvement of 48.04% in path planning success rate, which verifies the effectiveness and robustness of the algorithm in path planning. The method proposed in this paper is compared with the Q-learning, which has a 28.14% improvement in path planning success rate with better environment exploration and path planning capabilities.

    参考文献
    相似文献
    引证文献
引用本文

邓修朋,崔建明,李敏,张小军,宋戈.深度强化学习在机器人路径规划中的应用[J].电子测量技术,2023,46(6):1-8

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-02-19
  • 出版日期:
文章二维码