基于改进PPO算法的自动驾驶技术研究

首页 > 过刊浏览>2023年第46卷第8期 >162-168

基于改进PPO算法的自动驾驶技术研究
DOI:
                        
                    
CSTR:
                        [cstr]
                    
作者:
                        姚悦吉姚悦吉
北方自动控制技术研究所 太原 030000
在期刊界中查找
在百度中查找
在本站中查找
明佳明佳
北方自动控制技术研究所 太原 030000
在期刊界中查找
在百度中查找
在本站中查找
杨霄杨霄
北方自动控制技术研究所 太原 030000
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:北方自动控制技术研究所 太原 030000
作者简介:
通讯作者:
中图分类号:TP3
基金项目:军委科技委预先研究项目（2016330ZD01200101）资助

Research on autonomous driving technology based on improved PPO algorithm

Author:

Yao YueJi
Yao YueJi
North Automatic Control Technology Institute，Taiyuan 030000, China
在期刊界中查找
在百度中查找
在本站中查找
Mingjia
Mingjia
North Automatic Control Technology Institute，Taiyuan 030000, China
在期刊界中查找
在百度中查找
在本站中查找
Yang Xiao
Yang Xiao
North Automatic Control Technology Institute，Taiyuan 030000, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

North Automatic Control Technology Institute，Taiyuan 030000, China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对强化学习在解决端到端自动驾驶行为决策问题时面临采样效率低、环境适应性差、决策效果不佳的问题，提出循环近端策略优化算法（RPPO），采用LSTM与移动翻转瓶颈卷积模块构建策略网络与价值网络，有效整合前后帧的关联信息，实现智能体对多变情况的预测，提高智能体对环境的快速认知能力，并在价值网络添加L2正则化层，进一步提高算法的泛化能力，最后手动设置智能体在2个连续帧中保持动作不变，引入先验知识约束搜索空间，加快算法收敛。通过CARLA开源模拟环境测试，该改进方法与传统方法相比，奖励曲线明显占优，且直行、转弯、指定路线行驶3类任务的成功率分别提高了10%、16%、30%，证明提出的方法更有效。

关键词:自动驾驶;强化学习;移动翻转瓶颈卷积;LSTM

Abstract:

To address the problems of low sampling efficiency, poor environmental adaptation, and poor decision making that reinforcement learning faces in solving endtoend autonomous driving behavioral decision problems, a recurrent proximal policy optimization (RPPO) algorithm is proposed, which introduces a mobile inverted bottleneck convolution module and LSTM to construct a policy network and a value network, which effectively integrate the correlation information of front and back frames to achieve the prediction of multivariate situations by the intelligent body, improve the rapid cognitive ability of the intelligent body to the environment, and add L2 regularization layer to the value network to further improve the generalization ability of the algorithm, and finally manually set the intelligent body to keep the action constant in two consecutive frames, introduce a priori knowledge to constrain the search space and accelerate the convergence of the algorithm. Through CARLA open source simulation environment testing, the improved method significantly dominated the reward curve compared with the traditional method, and the success rates of three types of tasks, namely, straight ahead, turning, and designated route driving, increased by 10%, 16%, and 30%, respectively, proving that the proposed method is more effective.

Key words:autonomous driving;reinforcement learning;mobile inverted bottleneck convolution;LSTM

引用本文

姚悦吉,明佳,杨霄.基于改进PPO算法的自动驾驶技术研究[J].电子测量技术,2023,46(8):162-168

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2024-02-07
出版日期:

网站首页

杂志简介

过刊浏览

投稿须知

欢迎订阅

联系我们

English

引用本文

分享

文章指标

历史

文章二维码

网站首页

杂志简介

过刊浏览

投稿须知

欢迎订阅

联系我们

English

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码