Abstract:This paper investigates the problem of anti-jamming communications with intelligent frequency hopping in complex electromagnetic environment. Essentially, this paper proposes a new mixed deep recurrent Q-learning network (MixDRQN) for reinforcement learning (RL) of the optimal antijamming strategy. The proposed deep RL algorithm effectively combines double deep Q-learning network(DoubleDQN) and dueling deep Q-learning network(DuelingDQN), and further introduces long short-term memory (LSTM) layer for preprocessing the time-sensitive inputs. With the use of DoubleDQN, the proposed RL algorithm solves the problem of Q-value over-estimation caused by ε-greedy algorithm. In the mean time, the use of DuelingDQN and the LSTM layer has been proved to be very efficient for learning the time-correlated feature of inputs. Extensive experimental results show that both the convergence speed and anti-jamming performance are significantly improved, and in particular, the convergence speed of the proposed RL algorithm is more than 8 times higher than that of the existing RL algorithms.