注意力机制与神经渲染的多视图三维重建算法
DOI:
CSTR:
作者:
作者单位:

1.西安科技大学通信与信息工程学院 西安 710054; 2.西安科技大学电气与控制工程学院 西安 710054

作者简介:

通讯作者:

中图分类号:

TP391.4

基金项目:

国家自然科学基金(51774235)、陕西省重点研发计划项目(2021GY-338)、西安市碑林区科技计划项目(GX2333)资助


Attention mechanism and neural rendering for Multi-View 3D reconstruction algorithm
Author:
Affiliation:

1.School of Communication and Information Engineering, Xi′an University of Science and Technology, Xi′an 710054, China; 2.School of Electrical and Control Engineering, Xi′an University of Science and Technology, Xi′an 710054, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对多视图立体网络在弱纹理或非朗伯曲面等挑战性区域重建效果差的问题,首先提出一个基于3个并行扩展卷积和注意力机制的多尺度特征提取模块,在增加感受野的同时捕获特征之间的依赖关系以获取全局上下文信息,从而提升多视图立体网络在挑战性区域特征的表征能力以进行鲁棒的特征匹配。其次在代价体正则化3D CNN部分引入注意力机制,使网络注意于代价体中的重要区域以进行平滑处理。另外建立一个神经渲染网络,该网络利用渲染参考损失精确地解析辐射场景表达的几何外观信息,并引入深度一致性损失保持多视图立体网络与神经渲染网络之间的几何一致性,有效地缓解有噪声代价体对多视图立体网络的不利影响。该算法在室内DTU数据集中测试,点云重建的完整性和整体性指标分别为0.289和0.326,与基准方法CasMVSNet相比,分别提升24.9%和8.2%,即使在挑战性区域也得到高质量的重建效果;在室外Tanks and Temples中级数据集中,点云重建的平均F-score为60.31,与方法UCS-Net相比提升9.9%,体现出较强的泛化能力。

    Abstract:

    Aiming at the problem of poor reconstruction of Multi-View Stereo Networks in challenging regions such as weak textures or non-Lambertian surfaces, this paper first proposes a multi-scale feature extraction module based on three parallel dilated convolution and attention mechanism, which enables the network to capture the dependencies between features while increasing the sensory field to obtain global context information, thus enhancing the multi-view stereo network′s ability to characterize features in challenging regions for robust feature matching. Secondly, an attention mechanism is introduced in the 3D CNN part of the cost volume regularization so that the network pays attention to the important regions in the cost volume for smoothing. Additionally, a neural rendering network is built, which utilizes the rendering reference loss to accurately resolve the geometric appearance information expressed by the radiance field and introduces the depth consistency loss to maintain the geometric consistency between the multi-view stereo network and the neural rendering network, which effectively mitigates the detrimental effect of the noisy cost volume on the multi-view stereo network. The algorithm is tested in the indoor DTU dataset, achieving completeness and overall metrics of 0.289 and 0.326, respectively. Compared to the benchmark method CasMVSNet, there is an improvement of 24.9% and 8.2% in the two metrics, demonstrating high-quality reconstruction even in challenging regions. In the outdoor Tanks and Temples intermediate dataset, the average F-score for point cloud reconstruction is 60.31, showing a 9.9% improvement over the UCS-Net method. This reflects the algorithm′s strong generalization capability.

    参考文献
    相似文献
    引证文献
引用本文

朱代先,孔浩然,秋强,刘树林,张亚莉.注意力机制与神经渲染的多视图三维重建算法[J].电子测量技术,2024,47(5):158-166

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-06-05
  • 出版日期:
文章二维码