Unsupervised Learning of Spatiotemporally Coherent Metrics

Abstract

Current state-of-the-art classification and detection algorithms train deep convolutional networks using labeled data. In this work we study unsupervised feature learning with convolutional networks in the context of temporally coherent unlabeled data. We focus on feature learning from unlabeled video data, using the assumption that adjacent video frames contain semantically similar information. This assumption is exploited to train a convolutional pooling auto-encoder regularized by slowness and sparsity. We establish a connection between slow feature learning and metric learning. Using this connection we define "temporal coherence"--a criterion which can be used to select hyper-parameters automatically. In a transfer learning experiment, we show that the resulting encoder can be used to define a more semantically coherent metric without the use of labeled data.

Cite

Text

Goroshin et al. "Unsupervised Learning of Spatiotemporally Coherent Metrics." International Conference on Computer Vision, 2015. doi:10.1109/ICCV.2015.465

Markdown

[Goroshin et al. "Unsupervised Learning of Spatiotemporally Coherent Metrics." International Conference on Computer Vision, 2015.](https://mlanthology.org/iccv/2015/goroshin2015iccv-unsupervised/) doi:10.1109/ICCV.2015.465

BibTeX

@inproceedings{goroshin2015iccv-unsupervised,
  title     = {{Unsupervised Learning of Spatiotemporally Coherent Metrics}},
  author    = {Goroshin, Ross and Bruna, Joan and Tompson, Jonathan and Eigen, David and LeCun, Yann},
  booktitle = {International Conference on Computer Vision},
  year      = {2015},
  doi       = {10.1109/ICCV.2015.465},
  url       = {https://mlanthology.org/iccv/2015/goroshin2015iccv-unsupervised/}
}