IncVGGT: Incremental VGGT for Memory-Bounded Long-Range 3D Reconstruction

Abstract

We present IncVGGT, a training-free incremental variant of VGGT that makes transformer-based 3D reconstruction feasible for long sequences in real-world applications. Vanilla VGGT relies on dense global attention, which causes memory to grow quadratically and requires excessive computation, making it impractical for long-sequence scenarios. Even evolved streaming variants, such as StreamVGGT, still suffer from rapidly growing cache and latency. IncVGGT addresses these challenges from two orthogonal directions: (1) register and fuse overlapping frames into composite views, reducing duplicate tokens, and (2) history-side pruning retains only the top-$k$ most relevant/maximum slots together with the most recent one, bounding cache growth. This incremental and memory-efficient design minimizes computation and memory occupation across arbitrarily long sequences. Compared to StreamVGGT, IncVGGT sustains arbitrarily long sequences with large efficiency gains (e.g., on 500-frame sequences, 58.5$\times$ fewer operators, 9$\times$ lower memory, 25.7$\times$ less energy, and 4.9$\times$ faster inference) while maintaining comparable accuracy. More importantly, unlike existing baselines that directly run out of memory beyond 300 (VGGT)–500 (StreamVGGT) frames, IncVGGT continues to operate smoothly even on 10k-frame inputs under an 80GB GPU, showing that our design truly scales to ultra-long sequences without hitting memory limits. These results highlight IncVGGT’s potential for deployment in resource-constrained edge devices for long-range 3D scenarios.

Cite

Text

Fang et al. "IncVGGT: Incremental VGGT for Memory-Bounded Long-Range 3D Reconstruction." International Conference on Learning Representations, 2026.

Markdown

[Fang et al. "IncVGGT: Incremental VGGT for Memory-Bounded Long-Range 3D Reconstruction." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/fang2026iclr-incvggt/)

BibTeX

@inproceedings{fang2026iclr-incvggt,
  title     = {{IncVGGT: Incremental VGGT for Memory-Bounded Long-Range 3D Reconstruction}},
  author    = {Fang, Keyu and Zhou, Changchun and Fu, Yuzhe and Li, Hai Helen and Chen, Yiran},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/fang2026iclr-incvggt/}
}