Abstract:To address the problem of motorists using cell phones and smoking behaviors during driving threatening traffic safety, this paper proposes an improved YOLOv7-based network model . Firstly, the MobileNetv3 backbone network is used instead of the original YOLOv7 backbone network to reduce the number of model parameters and computation and improve the processing speed of the model. The depth separable convolution and sub-pixel convolution are used to build an improved feature pyramid branch and fuse it with the output feature layer of the original feature pyramid to enrich the feature information and enhance the feature extraction effect. Finally, the feature enhancement module is finally used to enhance the fused feature layer to improve the attention of both the feature layer channels and regions. The experimental results show that the mean average precision of the improved network model is 95.33%, and the detection speed is 75.31 frames per second. Compared with the original YOLOv7 network, the mean average precision is increased by 6.84%, and the detection speed is increased by 17.25 frames per second. It has higher detection accuracy on the basis of satisfying real-time detection and can realize real-time and accurate detection of drivers′ use of cell phones and smoking behavior.