An End-to-End Generative Framework for Video Segmentation and Recognition
Abstract
We describe an end-to-end generative approach for the segmentation and recognition of human activities. In this approach, a visual representation based on reduced Fisher Vectors is combined with a structured temporal model for recognition. We show that the statistical properties of Fisher Vectors make them an especially suitable front-end for generative models such as Gaussian mixtures. The system is evaluated for both the recognition of complex activities as well as their parsing into action units. Using a variety of video datasets ranging from human cooking activities to animal behaviors, our experiments demonstrate that the resulting architecture outperforms state-of-the-art approaches for larger datasets, i.e. when sufficient amount of data is available for training structured generative models.
Cite
Text
Kuehne et al. "An End-to-End Generative Framework for Video Segmentation and Recognition." IEEE/CVF Winter Conference on Applications of Computer Vision, 2016. doi:10.1109/WACV.2016.7477701Markdown
[Kuehne et al. "An End-to-End Generative Framework for Video Segmentation and Recognition." IEEE/CVF Winter Conference on Applications of Computer Vision, 2016.](https://mlanthology.org/wacv/2016/kuehne2016wacv-end/) doi:10.1109/WACV.2016.7477701BibTeX
@inproceedings{kuehne2016wacv-end,
title = {{An End-to-End Generative Framework for Video Segmentation and Recognition}},
author = {Kuehne, Hilde and Gall, Juergen and Serre, Thomas},
booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
year = {2016},
pages = {1-8},
doi = {10.1109/WACV.2016.7477701},
url = {https://mlanthology.org/wacv/2016/kuehne2016wacv-end/}
}