GraViT-E: Gradient-Based Vision Transformer Search with Entangled Weights
Abstract
Differentiable one-shot neural architecture search methods have recently become popular since they can exploit weight-sharing to efficiently search in large architectural search spaces. These methods traditionally perform a continuous relaxation of the discrete search space to search for an optimal architecture. However, they suffer from large memory requirements, making their application to parameter-heavy architectures like transformers difficult. Recently, single-path one-shot methods have been introduced which often use weight entanglement to alleviate this issue by sampling the weights of the sub-networks from the largest model, which is itself the supernet. In this work, we propose a continuous relaxation of weight entanglement-based architectural representation. Our Gradient-based Vision Transformer Search with Entangled Weights (GraViT-E) combines the best properties of both differentiable one-shot NAS and weight entanglement. We observe that our method imparts much better regularization properties and memory efficiency to the trained supernet. We study three one-shot optimizers on the Vision Transformer search space and observe that our method outperforms existing baselines on multiple datasets while being upto 35% more parameter efficient on ImageNet-1k.
Cite
Text
Sukthanker et al. "GraViT-E: Gradient-Based Vision Transformer Search with Entangled Weights." NeurIPS 2022 Workshops: MetaLearn, 2022.Markdown
[Sukthanker et al. "GraViT-E: Gradient-Based Vision Transformer Search with Entangled Weights." NeurIPS 2022 Workshops: MetaLearn, 2022.](https://mlanthology.org/neuripsw/2022/sukthanker2022neuripsw-gravite/)BibTeX
@inproceedings{sukthanker2022neuripsw-gravite,
title = {{GraViT-E: Gradient-Based Vision Transformer Search with Entangled Weights}},
author = {Sukthanker, Rhea Sanjay and Krishnakumar, Arjun and Patil, Sharat and Hutter, Frank},
booktitle = {NeurIPS 2022 Workshops: MetaLearn},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/sukthanker2022neuripsw-gravite/}
}