DARTS Without a Validation Set: Optimizing the Marginal Likelihood
Abstract
The success of neural architecture search (NAS) has historically been limited by excessive compute requirements. While modern weight-sharing NAS methods such as DARTS are able to finish the search in single-digit GPU days, extracting the final best architecture from the shared weights is notoriously unreliable. Training-Speed-Estimate (TSE), a recently developed generalization estimator with a Bayesian marginal likelihood interpretation, has previously been used in place of the validation loss for gradient-based optimization in DARTS. This prevents the DARTS skip connection collapse, which significantly improves performance on NASBench-201 and the original DARTS search space. We extend those results by applying various DARTS diagnostics and show several unusual behaviors arising from not using a validation set. Furthermore, our experiments yield concrete examples of the depth gap and topology selection in DARTS having a strongly negative impact on the search performance despite generally receiving limited attention in the literature compared to the operations selection.
Cite
Text
Fil et al. "DARTS Without a Validation Set: Optimizing the Marginal Likelihood." NeurIPS 2021 Workshops: MetaLearn, 2021.Markdown
[Fil et al. "DARTS Without a Validation Set: Optimizing the Marginal Likelihood." NeurIPS 2021 Workshops: MetaLearn, 2021.](https://mlanthology.org/neuripsw/2021/fil2021neuripsw-darts/)BibTeX
@inproceedings{fil2021neuripsw-darts,
title = {{DARTS Without a Validation Set: Optimizing the Marginal Likelihood}},
author = {Fil, Miroslav and Ru, Binxin and Lyle, Clare and Gal, Yarin},
booktitle = {NeurIPS 2021 Workshops: MetaLearn},
year = {2021},
url = {https://mlanthology.org/neuripsw/2021/fil2021neuripsw-darts/}
}