Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Abstract

We present Sequential Attend, Infer, Repeat (SQAIR), an interpretable deep generative model for image sequences. It can reliably discover and track objects through the sequence; it can also conditionally generate future frames, thereby simulating expected motion of objects. This is achieved by explicitly encoding object numbers, locations and appearances in the latent variables of the model. SQAIR retains all strengths of its predecessor, Attend, Infer, Repeat (AIR, Eslami et. al. 2016), including unsupervised learning, made possible by inductive biases present in the model structure. We use a moving multi-\textsc{mnist} dataset to show limitations of AIR in detecting overlapping or partially occluded objects, and show how \textsc{sqair} overcomes them by leveraging temporal consistency of objects. Finally, we also apply SQAIR to real-world pedestrian CCTV data, where it learns to reliably detect, track and generate walking pedestrians with no supervision.

Cite

Text

Kosiorek et al. "Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects." Neural Information Processing Systems, 2018.

Markdown

[Kosiorek et al. "Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/kosiorek2018neurips-sequential/)

BibTeX

@inproceedings{kosiorek2018neurips-sequential,
  title     = {{Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects}},
  author    = {Kosiorek, Adam and Kim, Hyunjik and Teh, Yee Whye and Posner, Ingmar},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {8606-8616},
  url       = {https://mlanthology.org/neurips/2018/kosiorek2018neurips-sequential/}
}