Abstract:Aiming at the problems of insufficient network fusion and low detection efficiency in current target detection using RGB-D images, a feature-level fusion network structure based on attention mechanism is proposed. First, under the backbone network structure based on Yolo v3, the RGB and Depth networks are trained separately with the labeled RGB-D samples, and then the two features are enhanced by the attention module, and finally the final feature weights are obtained by layer-by-layer fusion in the middle of the network. Tested on the challenging NYU Depth v2 data set, the average accuracy of the method in this paper is 77.8%. Through comparative experiments, it is concluded that the fusion network based on the attention mechanism proposed in this paper has significantly improved performance compared with similar algorithms.