Joint Representation Learning and Novel Category Discovery on Single- and Multi-Modal Data

Xuhui Jia, Kai Han, Yukun Zhu, Bradley Green

ICCV 2021 pp. 610-619

doi:10.1109/ICCV48922.2021.00065 /iccv/2021/jia2021iccv-joint/

Abstract

This paper studies the problem of novel category discovery on single- and multi-modal data with labels from different but relevant categories. We present a generic, end-to-end framework to jointly learn a reliable representation and assign clusters to unlabelled data. To avoid over-fitting the learnt embedding to labelled data, we take inspiration from self-supervised representation learning by noise-contrastive estimation and extend it to jointly handle labelled and unlabelled data. In particular, we propose using category discrimination on labelled data and cross-modal discrimination on multi-modal data to augment instance discrimination used in conventional contrastive learning approaches. We further employ Winner-Take-All (WTA) hashing algorithm on the shared representation space to generate pairwise pseudo labels for unlabelled data to better predict cluster assignments. We thoroughly evaluate our framework on large-scale multi-modal video benchmarks Kinetics-400 and VGG-Sound, and image benchmarks CIFAR10, CIFAR100 and ImageNet, obtaining state-of-the-art results.

PDF ICCV Semantic Scholar

Cite

Text

Jia et al. "Joint Representation Learning and Novel Category Discovery on Single- and Multi-Modal Data." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00065

Markdown

[Jia et al. "Joint Representation Learning and Novel Category Discovery on Single- and Multi-Modal Data." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/jia2021iccv-joint/) doi:10.1109/ICCV48922.2021.00065

BibTeX

@inproceedings{jia2021iccv-joint,
  title     = {{Joint Representation Learning and Novel Category Discovery on Single- and Multi-Modal Data}},
  author    = {Jia, Xuhui and Han, Kai and Zhu, Yukun and Green, Bradley},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {610-619},
  doi       = {10.1109/ICCV48922.2021.00065},
  url       = {https://mlanthology.org/iccv/2021/jia2021iccv-joint/}
}