Abstract:Feature pyramid network (FPN) has become an effective framework for extracting multi-scale features in object detection. However, FPN has problems such as loss of semantic information due to channel reduction, high-level features only contain single-scale context information, and the direct fusion of different layer features with semantic differences resulting in aliasing effects. In response to the above problems, this paper proposes a feature pyramid network based on attention enhancement guidance, which is composed of channel feature enhancement module, context enhancement module and attention guidance fusion module. Specifically, the channel feature enhancement module reduces the information loss caused by channel reduction by modeling the dependency between the features, the context enhancement module uses different levels of features to extract context information to enhance high-level features,and the attention guidance feature fusion module uses the attention mechanism to guide the feature learning of adjacent layers to enhance the consistency of semantic information with each other. This paper replaces the FPN in the Faster R-CNN and Mask R-CNN object detectors with AEGFPN and performs experiments on different data sets, which experimental results show that the average accuracy of the improved Faster R-CNN detector on the PASCAL VOC and MS COCO datasets is increased by 1.5% and 1%, respectively, and the improved Mask R-CNN detector also improves the performance of Mask AP and Box AP by 0.8% and 1.1% on the MS COCO data set.