Abstract:Aiming at the problem that the traditional graph convolutional network ignores the relationship between spatial and temporal features, a dual-stream network model based on the combination of residual structure and graph convolutional network is proposed. First of all, the network includes two channels of space flow and time flow. The gesture skeleton information is constructed into a space diagram and a time sequence diagram as the input of the two channels. The training speed is greatly improved by separating the time dimension and the space dimension. Then, in order to increase the depth of the network and avoid problems such as the disappearance of gradients, the residual structure is embedded and improved to make more effective use of time features and ensure the diversity of features. Finally, the spatial point set sequence and the time edge set sequence output by the two channels are converted in series and input into the Softmax classifier for classification, and the recognition result is obtained. The newly proposed method is tested on the CSL and DEVISIGN-L gesture datasets, and the results show that the recognition accuracy on the two datasets reaches 96.2% and 69.3%, which proves that the method has a certain degree of advancement.