FlowFormer: A Transformer Architecture for Optical Flow

Abstract

We introduce optical Flow transFormer, dubbed as FlowFormer, a transformer-based neural network architecture for learning optical flow. FlowFormer tokenizes the 4D cost volume built from an image pair, encodes the cost tokens into a cost memory with alternate-group transformer (AGT) layers in a novel latent space, and decodes the cost memory via a recurrent transformer decoder with dynamic positional cost queries. On the Sintel benchmark, FlowFormer achieves 1.144 and 2.183 average end-ponit-error (AEPE) on the clean and final pass, a 17.6% and 11.6% error reduction from the best published result (1.388 and 2.47). Besides, FlowFormer also achieves strong generalization performance. Without being trained on Sintel, FlowFormer achieves 0.95 AEPE on the Sintel training set clean pass, outperforming the best published result (1.29) by 26.9%.

Cite

Text

Huang et al. "FlowFormer: A Transformer Architecture for Optical Flow." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19790-1_40

Markdown

[Huang et al. "FlowFormer: A Transformer Architecture for Optical Flow." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/huang2022eccv-flowformer/) doi:10.1007/978-3-031-19790-1_40

BibTeX

@inproceedings{huang2022eccv-flowformer,
  title     = {{FlowFormer: A Transformer Architecture for Optical Flow}},
  author    = {Huang, Zhaoyang and Shi, Xiaoyu and Zhang, Chao and Wang, Qiang and Cheung, Ka Chun and Qin, Hongwei and Dai, Jifeng and Li, Hongsheng},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-19790-1_40},
  url       = {https://mlanthology.org/eccv/2022/huang2022eccv-flowformer/}
}