Abstract:Visible infrared person re-identification is a cross-modal retrieval problem. Being able to accurately match pedestrians remains challenging due to the large modal differences between visible and infrared images. Recent research has shown that using pooling to describe local features of body parts as well as global features of the human image itself can give a robust feature representation even when body parts are missing, but simple global average pooling is difficult to obtain detailed features of pedestrians. To address this problem, this paper proposes a new global multi-granularity pooling approach that uses a combination of global average pooling(GAP) and global maximum pooling(GMP) to extract more background and texture information of person. In addition, the traditional triplet loss does not work well for cross-modal person re-identification. We design a new cross-modal triplet loss to optimise intra-class and inter-class distances and supervise the network to learn differentiated feature representations. In this paper, we experimentally demonstrate the effectiveness of the proposed method and achieves 88.01% Rank-1, 79.26% mAP, and 60.24% Rank-1, 57.50% mAP on the RegDB and SYSU-MM01 datasets, respectively.