Abstract:Aiming at the action recognition of subtle actions in the similar background of drivers, X3D-M-GC-AE based on X3D network is proposed. By introducing the lightweight self-attention network GCnet, the attention to key features in time and space is improved, and the detection accuracy is improved without increasing parameter quantities. Action enhancement block is designed to make the network more sensitive to the action information in time series. Introducing knowledge distillation, taking X3D-XL as the teacher network and X3D-M-GC-AE as the student network, so that X3D-M-GC-AE can be used in real vehicles with less parameters and calculations. The experimental results show that the maximum test accuracy of teacher network can reach 75.56%, and that of student network can reach 71.13%. This framework can achieve high-precision detection results in the case of low requirements for vehicle hardware equipment.