基于强化学习的移动机器人路径规划优化
DOI:
CSTR:
作者:
作者单位:

广东电网有限责任公司广州供电局电力试验研究院 广州 510410

作者简介:

通讯作者:

中图分类号:

TP29;TP242.6

基金项目:


Optimization of Robot Path Planning Based on Reinforcement Learning
Author:
Affiliation:

Power Test Research Institute of Guangzhou Power Supply Bureau of Guangdong Power Grid Co., Ltd, Guangzhou 510410, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    随着信息化程度不断加深,移动机器人的应用越来越广泛,但在很多情况下,移动机器人需要工作在不断变化且复杂的环境中,由于无法提前获取环境信息,往往难以对移动机器人进行路径规划并寻找到一条合适的路径。针对这一问题,本文提出了一种移动机器人路径规划方法。该方法运用栅格法建立环境模型,利用探索步数定义回报值,并通过强化学习不断优化路径;针对强化学习中对环境的探索与利用的平衡问题,提出一种变化的ε-decreasing动作选择策略和学习率选择方法,使探索因子随着智能体对环境探索程度的增加而动态变化,从而加快了学习算法的收敛速度。仿真结果表明,该方法能够实现移动机器人在复杂的环境下的自主导航和快速路径规划,在获得相同路径长度的前提下,迭代次数相比于传统强化学习算法减少了约32%,有效的加快了收敛速度。

    Abstract:

    As the degree of informatization continues to deepen, the application of robots is becoming more and more extensive. However, in many cases, robots need to work in a constantly changing and complex environment. Because of the inability to obtain environmental information in advance, it is often difficult to plan a suitable path for a robot. To solve this problem, this paper proposes an method for robot path planning. This method uses the grid method to establish an environmental model, uses the number of exploration steps to define the return value, and continuously optimizes the path through reinforcement learning. At the same time, aiming at the problem of the balance between the exploration and utilization of the environment in reinforcement learning, a variable ε-decreasing action selection strategy and learning rate selection method are proposed to make the exploration factor dynamically change as the agent explores the environment, thereby accelerating the convergence speed of the learning algorithm. Simulation results show that this method can realize autonomous navigation and fast path planning of mobile robots in complex environments, compared with traditional algorithms, under the premise of obtaining the same path length, the number of iterations is reduced by approximately 32%, effectively speeding up the convergence speed.

    参考文献
    相似文献
    引证文献
引用本文

尹旷,王红斌,方健,莫文雄,叶建斌,张宇.基于强化学习的移动机器人路径规划优化[J].电子测量技术,2021,44(10):91-95

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-09-23
  • 出版日期:
文章二维码