Light3R-SfM: Towards Feed-Forward Structure-from-Motion

Abstract

We present Light3R-SfM, a feed-forward, end-to-end learnable framework for efficient large-scale Structure-from-Motion (SfM) from unconstrained image collections. Unlike existing SfM solutions that rely on costly matching and global optimization to achieve accurate 3D reconstructions, Light3R-SfM addresses this limitation through a novel latent global alignment module. This module replaces traditional global optimization with a learnable attention mechanism, effectively capturing multi-view constraints across images for robust and precise camera pose estimation. Light3R-SfM constructs a sparse scene graph via retrieval-score-guided shortest path tree to dramatically reduce memory usage and computational overhead compared to the naive approach. Extensive experiments demonstrate that Light3R-SfM achieves competitive accuracy while significantly reducing runtime, making it ideal for 3D reconstruction tasks in real-world applications with a runtime constraint.

Cite

Text

Elflein et al. "Light3R-SfM: Towards Feed-Forward Structure-from-Motion." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01563

Markdown

[Elflein et al. "Light3R-SfM: Towards Feed-Forward Structure-from-Motion." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/elflein2025cvpr-light3rsfm/) doi:10.1109/CVPR52734.2025.01563

BibTeX

@inproceedings{elflein2025cvpr-light3rsfm,
  title     = {{Light3R-SfM: Towards Feed-Forward Structure-from-Motion}},
  author    = {Elflein, Sven and Zhou, Qunjie and Leal-Taixé, Laura},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {16774-16784},
  doi       = {10.1109/CVPR52734.2025.01563},
  url       = {https://mlanthology.org/cvpr/2025/elflein2025cvpr-light3rsfm/}
}