A3D: Studying Pretrained Representations with Programmable Datasets

Abstract

Rendered images have been used to debug models, study inductive biases, and understand transfer learning. To scale up rendered datasets, we construct a pipeline with 40 classes of images including furniture and consumer products, backed by 48,716 distinct object models, 480 environments, and 563 materials. We can easily vary dataset diversity along four axes—object diversity, environment, material, and camera angle, making the dataset "programmable". Using this ability, we systematically study how these axes of data characteristics influence pretrained representations. We generate 21 datasets by reducing diversity along different axes, and study performance on five downstream tasks. We find that reducing environment has the biggest impact on performance and is harder to recover after fine-tuning. We corroborate this by visualizing the models’ representations, findings that models trained on diverse environments learn more visually meaningful features.

Cite

Text

Wang et al. "A3D: Studying Pretrained Representations with Programmable Datasets." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022. doi:10.1109/CVPRW56347.2022.00535

Markdown

[Wang et al. "A3D: Studying Pretrained Representations with Programmable Datasets." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022.](https://mlanthology.org/cvprw/2022/wang2022cvprw-a3d/) doi:10.1109/CVPRW56347.2022.00535

BibTeX

@inproceedings{wang2022cvprw-a3d,
  title     = {{A3D: Studying Pretrained Representations with Programmable Datasets}},
  author    = {Wang, Ye and Mu, Norman and Grandi, Daniele and Savva, Nicolas and Steinhardt, Jacob},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2022},
  pages     = {4877-4885},
  doi       = {10.1109/CVPRW56347.2022.00535},
  url       = {https://mlanthology.org/cvprw/2022/wang2022cvprw-a3d/}
}