Abstract:Two-dimensional discrete fast Fourier transformation is widely used in digital image processing, which is of great significance in engineering field. Usually, 2DFFT is computed using column decomposition, that is, a rowwise 1DFFT followed by another columnwise one. Due to the limitation of data transmission bandwidth of field programmable gate array and the physical structure characteristics of related storage hardware, this method cannot meet the requirement of realtime processing of highresolution images. The scheme of row FFTtransposedrow FFT can reduce the waiting time of direct memory access controller in the computation process and improve the computational efficiency of 2DFFT, but the existing implementation of matrix transposition has significant limitations. Traditional design uses load and store instructions to complete the transposition of a matrix. This paper proposes a 2DFFT scheme based on fast block transposition. By building a transposition module and a fourway parallel 1DFFT module, the FPGA onchip resources are fully utilized, thus the delay is reduced. The experiment is based on Xilinx Kintex UltraScale FPGA, and under the same clock frequency and parallel conditions, different 2DFFT calculation schemes are compared. Within the experimental error range, the solution proposed in this paper improves the computational efficiency by about 15 times.