GraViT-E: Gradient-Based Vision Transformer Search with Entangled Weights

Abstract

Differentiable one-shot neural architecture search methods have recently become popular since they can exploit weight-sharing to efficiently search in large architectural search spaces. These methods traditionally perform a continuous relaxation of the discrete search space to search for an optimal architecture. However, they suffer from large memory requirements, making their application to parameter-heavy architectures like transformers difficult. Recently, single-path one-shot methods have been introduced which often use weight entanglement to alleviate this issue by sampling the weights of the sub-networks from the largest model, which is itself the supernet. In this work, we propose a continuous relaxation of weight entanglement-based architectural representation. Our Gradient-based Vision Transformer Search with Entangled Weights (GraViT-E) combines the best properties of both differentiable one-shot NAS and weight entanglement. We observe that our method imparts much better regularization properties and memory efficiency to the trained supernet. We study three one-shot optimizers on the Vision Transformer search space and observe that our method outperforms existing baselines on multiple datasets while being upto 35% more parameter efficient on ImageNet-1k.

Cite

Text

Sukthanker et al. "GraViT-E: Gradient-Based Vision Transformer Search with Entangled Weights." NeurIPS 2022 Workshops: MetaLearn, 2022.

Markdown

[Sukthanker et al. "GraViT-E: Gradient-Based Vision Transformer Search with Entangled Weights." NeurIPS 2022 Workshops: MetaLearn, 2022.](https://mlanthology.org/neuripsw/2022/sukthanker2022neuripsw-gravite/)

BibTeX

@inproceedings{sukthanker2022neuripsw-gravite,
  title     = {{GraViT-E: Gradient-Based Vision Transformer Search with Entangled Weights}},
  author    = {Sukthanker, Rhea Sanjay and Krishnakumar, Arjun and Patil, Sharat and Hutter, Frank},
  booktitle = {NeurIPS 2022 Workshops: MetaLearn},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/sukthanker2022neuripsw-gravite/}
}