Abstract:Aiming at the poor estimation of the existing stereo matching algorithms on the details of objects and the fact that supervised algorithms relying on a large number of groundtruth disparity maps, this paper proposes a self-supervised stereo matching algorithm combining deep and shallow features. The algorithm embeds Efficient Channel Attention in the feature extraction network to extract shallow and more expressive deep features of the picture. The cost volume predicting initial disparities are constructed based on the deep features, and the shallow features are used to guide the optimization of the initial disparities. In addition, in the loss function section, on the basis of the left and right disparity consistency loss, this paper proposes the left and right feature consistency loss, which strengthens the constraint effect of shallow feature information on disparity maps and improves the robustness of the algorithm. This article trains and evaluates on the KITTI 2015 dataset and applies it to the actual scenes taken by us. Experimental results show that the proposed method can achieve better results than other algorithms, especially in the details where the disparity changes suddenly.