Long-Range Attention Network for Multi-View Stereo

Abstract

Learning-based multi-view stereo (MVS) has recently gained great popularity, which can efficiently infer depth map and reconstruct fine-grained scene geometry. Previous methods calculate the variance of the corresponding pixel pairs to determine whether they are matched mostly based on the pixel-wise measure, which fails to consider the interdependence among pixels and is ineffective on the matching of texture-less or occluded regions. These false matching problems challenge MVS and result in its most failure cases. To address the issues, we introduce a Long-range Attention Network (LANet) to selectively aggregate reference features to each position to capture the long-range interdependence across the entire space. As a result, similar features relate to each other regardless of their distance, propagating more guiding information for the effective match. Furthermore, we introduce a new loss to supervise the intermediate probability volume by constraining its distribution reasonably centered at the true depth. Extensive experiments on large-scale DTU dataset demonstrate that the proposed LANet achieves the new state-of-the-art performance, outperforming previous methods by a large margin. Our method is generic and also achieves comparable results on outdoor Tanks and Temples dataset without any fine-tuning, which validates our method's generalization ability.

Cite

Text

Zhang et al. "Long-Range Attention Network for Multi-View Stereo." Winter Conference on Applications of Computer Vision, 2021.

Markdown

[Zhang et al. "Long-Range Attention Network for Multi-View Stereo." Winter Conference on Applications of Computer Vision, 2021.](https://mlanthology.org/wacv/2021/zhang2021wacv-longrange/)

BibTeX

@inproceedings{zhang2021wacv-longrange,
  title     = {{Long-Range Attention Network for Multi-View Stereo}},
  author    = {Zhang, Xudong and Hu, Yutao and Wang, Haochen and Cao, Xianbin and Zhang, Baochang},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2021},
  pages     = {3782-3791},
  url       = {https://mlanthology.org/wacv/2021/zhang2021wacv-longrange/}
}