Abstract:In the fields of UAV reconnaissance, security monitoring, and autonomous driving, target detection technology faces significant challenges. Targets in images often exhibit multi-scale attributes, making detection of small-sized targets particularly difficult, and targets are prone to various degrees of occlusion. To address these pressing issues, this paper proposes an innovative dynamic multi-scale target detection model: YOLO-DDE. Firstly, novel CEMA and CED convolutional modules are introduced to enhance the backbone network′s ability to handle multi-scale information and extract fine features, thus achieving more precise recognition in complex scenes. Additionally, the FPAN network structure is innovatively restructured into the DFPN structure, which employs longitudinal cross-scale fusion technology to significantly improve the model′s scale feature fusion effect.Finally, a dynamic detection head is introduced, proposing the DD-Head structure, which strengthens the model′s ability to handle downstream tasks. In summary, the proposed YOLO-DDE model, with its dynamic multi-scale structure, provides new possibilities for improving target detection technology performance.Experiments on the PASCAL VOC dataset were conducted to validate the proposed model. Compared to the current state-of-the-art model YOLOv8, the YOLO-DDE model achieves a 1.8% and 3.2% improvement in evaluation metrics map50 and map50.95, respectively. Furthermore, generalization experiments on the VisDrone, HIT-UAV, and FAIR1M2.0 datasets validate the model′s strong generalization ability.