SlowFast Networks for Video Recognition

Abstract

We present SlowFast networks for video recognition. Our model involves (i) a Slow pathway, operating at low frame rate, to capture spatial semantics, and (ii) a Fast pathway, operating at high frame rate, to capture motion at fine temporal resolution. The Fast pathway can be made very lightweight by reducing its channel capacity, yet can learn useful temporal information for video recognition. Our models achieve strong performance for both action classification and detection in video, and large improvements are pin-pointed as contributions by our SlowFast concept. We report state-of-the-art accuracy on major video recognition benchmarks, Kinetics, Charades and AVA. Code has been made available at: https://github.com/facebookresearch/SlowFast.

Cite

Text

Feichtenhofer et al. "SlowFast Networks for Video Recognition." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. doi:10.1109/ICCV.2019.00630

Markdown

[Feichtenhofer et al. "SlowFast Networks for Video Recognition." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.](https://mlanthology.org/iccv/2019/feichtenhofer2019iccv-slowfast/) doi:10.1109/ICCV.2019.00630

BibTeX

@inproceedings{feichtenhofer2019iccv-slowfast,
  title     = {{SlowFast Networks for Video Recognition}},
  author    = {Feichtenhofer, Christoph and Fan, Haoqi and Malik, Jitendra and He, Kaiming},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      = {2019},
  doi       = {10.1109/ICCV.2019.00630},
  url       = {https://mlanthology.org/iccv/2019/feichtenhofer2019iccv-slowfast/}
}