基于多智能体强化学习值分解的优化算法

首页 > 过刊浏览>2023年第46卷第7期 >73-79

基于多智能体强化学习值分解的优化算法
DOI:
                        
                    
CSTR:
                        [cstr]
                    
作者:
                        王慧琴王慧琴
南京信息工程大学自动化学院 南京 210044
在期刊界中查找
在百度中查找
在本站中查找
苗国英苗国英
南京信息工程大学自动化学院 南京 210044
在期刊界中查找
在百度中查找
在本站中查找
孙英博孙英博
南京信息工程大学自动化学院 南京 210044
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:南京信息工程大学自动化学院 南京 210044
作者简介:
通讯作者:
中图分类号:TP242
基金项目:国家自然科学基金(62073169)、江苏省“333 工程”项目(BRA2020067)资助

Optimization algorithm based on value decomposition of multi-agent reinforcement learning

Author:

Wang Huiqin
Wang Huiqin
School of Automation, Nanjing University of Information Science & Technology，Nanjing 210044, China
在期刊界中查找
在百度中查找
在本站中查找
Miao Guoying
Miao Guoying
School of Automation, Nanjing University of Information Science & Technology，Nanjing 210044, China
在期刊界中查找
在百度中查找
在本站中查找
Sun Yingbo
Sun Yingbo
School of Automation, Nanjing University of Information Science & Technology，Nanjing 210044, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

School of Automation, Nanjing University of Information Science & Technology，Nanjing 210044, China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

当前多智能体强化学习在值分解的算法中无法充分考虑到多智能体间的协作关系，并且使用的随机策略在探索过程中容易出现越过最优点，陷入局部最优解的情况。针对以上问题，本文提出了一种深度交流多智能体强化学习算法。本文通过使用卷积和全连接结构在值分解网络中设计了一种通信机制以此来增强多智能体之间的协作。接着，本文提出了一种新的自适应探索策略，为了平衡数据探索与利用之间的矛盾，加入了周期性的衰减策略。最后，通过仿真结果验证了本文提出方法在部分场景中达到25.8%的性能提升，提高了多智能体的合作能力。

关键词:多智能体;强化学习;深度交流;探索策略

Abstract:

The current multi-agent reinforcement learning algorithm cannot fully consider the cooperative relationship between multi-agents in the value decomposition algorithm, and the stochastic strategy used in the exploration process is prone to cross the optimal point and fall into the local optimal solution. Aiming at the above problems, this paper proposes a deep communication multi-agent reinforcement learning algorithm. This paper designs a communication mechanism in value decomposition network by using convolution and fully connected structure to enhance the cooperation between multi-agents. Then, a new adaptive exploration strategy is proposed in this paper. In order to balance the contradiction between data exploration and utilization, a periodic decay strategy is added. Finally, simulation results verify that the proposed method achieves 25.8% performance improvement in some scenarios, and improves the cooperation capability of multi-agent.

Key words:multi-agent;reinforcement learning;deep communication;explore the strategy

引用本文

王慧琴,苗国英,孙英博.基于多智能体强化学习值分解的优化算法[J].电子测量技术,2023,46(7):73-79

复制

文章指标

点击次数:264
下载次数: 618
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2024-02-18
出版日期:

网站首页

杂志简介

过刊浏览

投稿须知

欢迎订阅

联系我们

English

引用本文

分享

文章指标

历史

文章二维码

网站首页

杂志简介

过刊浏览

投稿须知

欢迎订阅

联系我们

English

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码