基于深度强化学习的多目标边缘任务调度研究

首页 > 过刊浏览>2023年第46卷第8期 >74-81

基于深度强化学习的多目标边缘任务调度研究
DOI:
                        
                    
CSTR:
                        [cstr]
                    
作者:
                        
                        
                    
作者单位:1.常州大学微电子与控制工程学院 常州 213000; 2.常州大学计算机与人工智能学院 常州 213000
作者简介:
通讯作者:
中图分类号:TP393
基金项目:常州市重点研发计划（CJ20210123）项目资助

Research on multi-objective edge task scheduling based on deep reinforcement learning

Author:

Affiliation:

1.School of Microelectronics and Control Engineering, Changzhou University，Changzhou 213000,China; 2.School of Computer Science and Artificial Intelligence, Changzhou University，Changzhou 213000,China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对深度强化学习在边缘计算环境下的多目标任务调度时存在优化效果差等问题，提出了一种新的基于改进的竞争深度双Q网络的多目标任务调度算法(IMTS-D3QN)。首先将深度双Q网络对目标中的最大操作分解为动作选择和动作评估，以消除过高估计；采用立即奖励经验样本分类方法，对经验样本按照重要性程度分类存储，训练时选取更多重要性程度高的经验样本，提高了实际样本的利用率，加快了神经网络的训练速度。然后，通过引入竞争网络结构对神经网络进行优化。最后，采用软更新方法提高算法的稳定性，并采用动态ε贪婪指数递减法寻找最优策略。通过不同线性加权组合得出帕累托最优解，达到响应时间和能耗最小化。实验结果表明，IMTS-D3QN算法与其他算法相比，在不同任务数下响应时间与能耗上具有明显的优化效果。

Abstract:

Aiming at the problems of unstable convergence and poor optimization effect in the multi-objective task scheduling of deep reinforcement learning in the edge computing environment, a new multi-objective task scheduling algorithm based on an improved competitive deep double-Q network (IMTS-D3QN) was proposed. First, the selection and calculation of the target Q value are decoupled by the deep double-Q network to eliminate overestimation, the immediate reward experience sample classification method is adopted to extract experience samples from the experience replay unit, which improves the utilization rate of actual samples, which speeds up the training speed of the neural network. Then, the neural network is optimized by introducing competing network structures. Finally, the soft update method is used to improve the stability of the algorithm, and the dynamic ε-greedy exponential decreasing method is used to find the optimal strategy. The Pareto optimal solution is obtained through different linear weighting combinations to minimize the response time and energy consumption. The experimental results show that, compared with other algorithms, the IMTS-D3QN algorithm has obvious optimization effect in response time and energy consumption under different number of tasks.

参考文献

相似文献

引证文献

引用本文

盛煜,朱正伟,朱晨阳,诸燕平.基于深度强化学习的多目标边缘任务调度研究[J].电子测量技术,2023,46(8):74-81

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2024-02-07
出版日期:

网站首页

杂志简介

过刊浏览

投稿须知

欢迎订阅

联系我们

English

引用本文

分享

文章指标

历史

文章二维码