Abstract:Pedestrian tracking is a hot topic in deep learning research. The current tracking algorithm has the problems that it cannot meet the real-time performance and frequent ID conversion due to the high similarity of the tracking targets, the occlusion between the targets, and the irregular motion. In order to improve the running speed, a lightweight network combining CNN and transformer is used in the target detection stage, and a joint detection method is adopted to share feature weights, calculate detection, re-identification, and human pose estimation branches in parallel, and adjust the number of convolution channels of each branch at the same time. . The tracking part uses the target motion information predicted by Kalman filtering, the target re-identification information, and the position information of each key point of the target pose to complete the target identity matching, which reduces the frequent conversion of the same ID. The experimental part uses the MOT16 dataset for training and testing. The multi-target tracking accuracy (MOTA) of this algorithm is 48.5%, the multi-target tracking accuracy (MOTP) is 78.17%, the FPS is 20, and the model size is 18.4M. Experiments show that the proposed tracking algorithm improves the overall tracking performance, and the real-time performance and accuracy meet the expected requirements.