Learning Invariance Through Imitation

Abstract

Supervised methods for learning an embedding aim to map high-dimensional images to a space in which perceptually similar observations have high measurable similarity. Most approaches rely on binary similarity, typically defined by class membership where labels are expensive to obtain and/or difficult to define. In this paper we propose crowd-sourcing similar images by soliciting human imitations. We exploit temporal coherence in video to generate additional pairwise graded similarities between the user-contributed imitations. We introduce two methods for learning nonlinear, invariant mappings that exploit graded similarities. We learn a model that is highly effective at matching people in similar pose. It exhibits remarkable invariance to identity, clothing, background, lighting, shift and scale.

Cite

Text

Taylor et al. "Learning Invariance Through Imitation." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2011. doi:10.1109/CVPR.2011.5995538

Markdown

[Taylor et al. "Learning Invariance Through Imitation." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2011.](https://mlanthology.org/cvpr/2011/taylor2011cvpr-learning/) doi:10.1109/CVPR.2011.5995538

BibTeX

@inproceedings{taylor2011cvpr-learning,
  title     = {{Learning Invariance Through Imitation}},
  author    = {Taylor, Graham W. and Spiro, Ian and Bregler, Christoph and Fergus, Rob},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2011},
  pages     = {2729-2736},
  doi       = {10.1109/CVPR.2011.5995538},
  url       = {https://mlanthology.org/cvpr/2011/taylor2011cvpr-learning/}
}