Visual Scene Representations: Scaling and Occlusion in Convolutional Architectures
Abstract
We study the structure of representations, defined as approximations of minimal sufficient statistics that are maximal invariants to nuisance factors, for visual data subject to scaling and occlusion of line-of-sight. We derive analytical expressions for such representations and show that, under certain restrictive assumptions, they are related to features commonly in use in the computer vision community. This link highlights the condition tacitly assumed by these descriptors, and also suggests ways to improve and generalize them. This new interpretation draws connections to the classical theories of sampling, hypothesis testing and group invariance.
Cite
Text
Soatto et al. "Visual Scene Representations: Scaling and Occlusion in Convolutional Architectures." International Conference on Learning Representations, 2015.Markdown
[Soatto et al. "Visual Scene Representations: Scaling and Occlusion in Convolutional Architectures." International Conference on Learning Representations, 2015.](https://mlanthology.org/iclr/2015/soatto2015iclr-visual-a/)BibTeX
@inproceedings{soatto2015iclr-visual-a,
title = {{Visual Scene Representations: Scaling and Occlusion in Convolutional Architectures}},
author = {Soatto, Stefano and Dong, Jingming and Karianakis, Nikolaos},
booktitle = {International Conference on Learning Representations},
year = {2015},
url = {https://mlanthology.org/iclr/2015/soatto2015iclr-visual-a/}
}