3DQ-Nets: Visual Concepts Emerge in Pose Equivariant 3D Quantized Neural Scene Representations

Prabhudesai, Mihir; Lal, Shamit; Tung, Hsiao-Yu Fish; Harley, Adam W.; Potdar, Shubhankar; Fragkiadaki, Katerina

doi:10.1109/CVPRW50498.2020.00202

3DQ-Nets: Visual Concepts Emerge in Pose Equivariant 3D Quantized Neural Scene Representations

Mihir Prabhudesai, Shamit Lal, Hsiao-Yu Fish Tung, Adam W. Harley, Shubhankar Potdar, Katerina Fragkiadaki

CVPRW 2020 pp. 1567-1570

doi:10.1109/CVPRW50498.2020.00202 /cvprw/2020/prabhudesai2020cvprw-3dqnets/

Abstract

Concept learning lies at the very heart of intelligence, providing organizing principles with which to comprehend the world (6). Most computer vision models learn concept classifiers or detectors using labelled examples of object boxes, poses and categories. Self-supervised or unsupervised approaches mostly focus on (pre)training CNNs on auxiliary pretext tasks to lessen the need for human labels for a downstream recognition task. The visual "concepts" learnt from pretext tasks are implicit and represented as distributed neural CNN activations (7). We see the following limitations with representing concepts (solely) as neural activations across multiple layers of a deep network: i) visual memory and computation are not separated (3), which means that computation increases exponentially with the number of visual concepts learnt, ii) concepts are not stored somewhere and cannot be referred to or retrieved on demand, iii) the number of concepts cannot increase automatically based on new visual experiences, rather it is determined by the processing architecture; this misaligns with the idea that animals are capable of spontaneous concept instantiation in novel scenes (2), iv) concepts cannot be mentally manipulated by imagining variations, transformations or mental simulations (8), v) concepts do not have any spatial dimension and are hard to use for spatial reasoning.

PDF CVPRW Semantic Scholar

Cite

Text

Prabhudesai et al. "3DQ-Nets: Visual Concepts Emerge in Pose Equivariant 3D Quantized Neural Scene Representations." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020. doi:10.1109/CVPRW50498.2020.00202

Markdown

[Prabhudesai et al. "3DQ-Nets: Visual Concepts Emerge in Pose Equivariant 3D Quantized Neural Scene Representations." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020.](https://mlanthology.org/cvprw/2020/prabhudesai2020cvprw-3dqnets/) doi:10.1109/CVPRW50498.2020.00202

BibTeX

@inproceedings{prabhudesai2020cvprw-3dqnets,
  title     = {{3DQ-Nets: Visual Concepts Emerge in Pose Equivariant 3D Quantized Neural Scene Representations}},
  author    = {Prabhudesai, Mihir and Lal, Shamit and Tung, Hsiao-Yu Fish and Harley, Adam W. and Potdar, Shubhankar and Fragkiadaki, Katerina},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2020},
  pages     = {1567-1570},
  doi       = {10.1109/CVPRW50498.2020.00202},
  url       = {https://mlanthology.org/cvprw/2020/prabhudesai2020cvprw-3dqnets/}
}