Abstract:Traffic flow prediction is a hot research area in intelligent transportation systems, and the fundamental challenge is to effectively model the complex spatial-temporal correlations in traffic data. To address the problem that most existing spatial-temporal Transformer models ignore the important effects of temporal trend and spatial heterogeneity when constructing spatial-temporal correlation matrices, a traffic flow prediction model based on Spatial-Temporal Aware Transformer (STAFormer) is proposed. First, an improved spatial-temporal aware self-attention mechanism is used to mine potential temporal trend and spatial heterogeneity features in traffic flow data, establishing an accurate spatial-temporal correlation matrix to obtain global spatial-temporal features. Then, the multi-range diffusion convolution is used to simulate the multi-order diffusion process of traffic flow in the road network to capture the local spatial features. Finally, the multivariate feature fusion module is used to adaptively fuse the captured spatial-temporal features and output the prediction results. Experiments are conducted on two real traffic datasets, i.e. PeMS04 and PeMS08, and the results show that, compared with the recently proposed Transformer-based models such as RPConvformer, ASTGNN, and PDFormer, the mean absolute errors of STAFormer are reduced by 8.0%, 6.5%, and 2.0%, respectively.