ARFlow: Auto-Regressive Optical Flow Estimation for Arbitrary-Length Videos via Progressive Next-Frame Forecasting

Abstract

Optical flow estimation is a fundamental computer vision task that predicts per-pixel displacements from consecutive images. Recent works attempt to exploit temporal cues to improve the estimation performance. However, their temporal modeling is restricted to short video sequences due to the unaffordable computational burden, thereby suffering from restricted temporal receptive fields. Moreover, their group-wise paradigm in one forward pass undermines inter-group information exchange, leading to modest performance improvement. To address these problems, we propose a novel multi-frame optical flow network based on an auto-regressive paradigm, named ARFlow. Unlike previous multi-frame methods, our method can be scalable to arbitrary-length videos with marginal computational overhead. Specifically, we design an Auto-regressive Flow Initialization (AFI) module and an Auto-regressive Multi-stride Flow Refinement (AMFR) module to forecast the next-frame flow based on multi-stride history observations. Our ARFlow achieves state-of-the-art performance, ranking 1st on both KITTI-2015 and Spring official benchmarks and 2nd on the MPI-Sintel (Final) benchmark among all open-sourced methods. Furthermore, due to the auto-regressive nature, our method can generalize to arbitrary video length with a constant GPU memory usage of 2.1GB.

Cite

Text

Liu et al. "ARFlow: Auto-Regressive Optical Flow Estimation for Arbitrary-Length Videos via Progressive Next-Frame Forecasting." International Conference on Learning Representations, 2026.

Markdown

[Liu et al. "ARFlow: Auto-Regressive Optical Flow Estimation for Arbitrary-Length Videos via Progressive Next-Frame Forecasting." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/liu2026iclr-arflow/)

BibTeX

@inproceedings{liu2026iclr-arflow,
  title     = {{ARFlow: Auto-Regressive Optical Flow Estimation for Arbitrary-Length Videos via Progressive Next-Frame Forecasting}},
  author    = {Liu, Jiuming and Liu, Mengmeng and Zhu, Siting and Zhang, Yunpeng and Li, Jiangtao and Yang, Michael Ying and Nex, Francesco and Cheng, Hao and Wang, Hesheng},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/liu2026iclr-arflow/}
}