Abstract:Aiming at the problems of poor reconstruction completeness and insufficient generalization ability of multi-view stereo reconstruction in complex scenes such as uneven illumination, weak texture, and non-Lambertian surfaces, this paper proposes a multiview stereo reconstruction algorithm based on the attention mechanism. In the feature extraction stage, the algorithm adopts a multi-scale feature extraction module based on depth-separable convolution and self-attention mechanism, which enhances the spatial feature relationships among multiple views while expanding the sensory field, thus improving the network′s ability to characterize features in complex scenes to achieve more accurate feature matching. In the cost volume regularization stage, this paper introduces the channel attention mechanism to adaptively adjust the weights of different channels, so as to reduce the interference of irrelevant information on the model and filter the background noise to improve the generalization ability of the model. On the DTU dataset, the completeness and overall metrics of this paper′s algorithm are 0.286 and 0.334, respectively, which are improved by 25.71% and 5.92% compared to the benchmark algorithm CasMVSNet. The structure of the reconstructed point cloud is also more complete in complex scenes compared to other state-of-the-art (SOTA) algorithms. On the Tanks and Temples intermediate dataset, the reconstructed point cloud composite index F-score is 61.49, indicating that the algorithm in this paper has better robustness and generalization ability.