Learning Disentangled Representations of Videos with Missing Data

Abstract

Missing data poses significant challenges while learning representations of video sequences. We present Disentangled Imputed Video autoEncoder (DIVE), a deep generative model that imputes and predicts future video frames in the presence of missing data. Specifically, DIVE introduces a missingness latent variable, disentangles the hidden video representations into static and dynamic appearance, pose, and missingness factors for each object, while it imputes each object trajectory where data is missing. On a moving MNIST dataset with various missing scenarios, DIVE outperforms the state of the art baselines by a substantial margin. We also present comparisons on a real-world MOTSChallenge pedestrian dataset, which demonstrates the practical value of our method in a more realistic setting. Our code can be found in https://github.com/Rose-STL-Lab/DIVE.

Cite

Text

Comas et al. "Learning Disentangled Representations of Videos with Missing Data." Neural Information Processing Systems, 2020.

Markdown

[Comas et al. "Learning Disentangled Representations of Videos with Missing Data." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/comas2020neurips-learning/)

BibTeX

@inproceedings{comas2020neurips-learning,
  title     = {{Learning Disentangled Representations of Videos with Missing Data}},
  author    = {Comas, Armand and Zhang, Chi and Feric, Zlatan and Camps, Octavia and Yu, Rose},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/comas2020neurips-learning/}
}