Abstract:Monocular image depth estimation is a basic problem in the field of computer vision, Convolutional Spatial Propagation Network (CSPN) is one of the most advanced monocular image depth estimation methods. Aiming at the deformation problem of some objects and the boundary mixing problem caused by the blurring of the edges between objects in the dense depth map predicted by the network, we have improved CSPN from the network structure and loss function respectively. The input sparse depth map is downsampled three times with different sizes and added to the corresponding coding process and skip connection part of the U-Net module, so that it can more accurately capture the structure of objects with different scales. The original loss function is replaced by the improved loss function formed by the weighted combination of depth error logarithm, depth information gradient and surface normal. The experimental results on nyu-depth-v2 data set show that compared with CSPN, the root mean square error RMSE and average relative error REL of ICSPN are reduced by 17.23% and 28.07% respectively. The ICSPN in this paper makes full use of the input sparse depth map to reduce the deformation of the object structure in the predicted dense depth map. At the same time, the loss function with gradient loss is used to monitor the training process, which reduces the edge position error of the object and the problem of boundary mixing.