Abstract:The application of Transformer models has improved the performance of hyperspectral image denoising. However, the original Transformer model still falls short in effectively leveraging the spatial-spectral coupling in HSIs. It tends to excessively smooth spatial features, leading to the loss of small-scale structures. Moreover, it overly emphasizes all spectral channel features, neglecting the differences between different spectral bands. In order to solve these problems, this paper introduces a novel Sparse Spatial-Spectral Transformer model, enhancing the utilization of spatial-spectral coupling. In the spatial dimension, a local enhancement module is introduced to refine spatial feature details and deal with oversmoothing problem. Simultaneously, in the spectral dimension, a Top-k sparse self-attention mechanism is proposed, which adaptively selects the top-K most relevant spectral channel features for feature interaction, effectively capturing spatial-spectral characteristics. Ultimately, hyperspectral image denoising is achieved through hierarchical residual connections with the Sparse Spatial-Spectral Transformer. On the ICVL dataset, denoising performance for both Gaussian noise and complex noise attains peak signal-to-noise ratios of 40.56 dB and 40.19 dB, respectively, demonstrating the superior performance of the proposed Sparse Spatial-Spectral Transformer model in this paper.