Resource-Constrained Neural Architecture Search on Language Models: A Case Study

Abstract

Transformer-based language models have achieved milestones in natural language processing, but they come with challenges, mainly due to their computational footprint. Applying automated machine learning to these models can democratize their use and foster further research and development. We present a case study using neural architecture search (NAS) to optimize DistilBERT in a resource-constrained environment with a $4\,000$ GPU-hour budget. We employ an evolutionary algorithm that uses a two-level hierarchical search space and a segmented pipeline for component enhancement. While in order to obtain state-of-the-art results more compute budget is required, our results show efficient exploration, and a strong correlation between pre-training and downstream performance. This suggests a potential applicability of using pre-training validation as a cutoff criterion during model training. Finally, our learning curves analysis emphasizes the potential for efficient resource allocation through the adoption of an epoch-level stopping strategy, thus directing resources towards more promising candidate models. Future work should focus on scaling these insights to larger language models and more diverse tasks.

Cite

Text

Paraskeva et al. "Resource-Constrained Neural Architecture Search on Language Models: A Case Study." ICML 2024 Workshops: WANT, 2024.

Markdown

[Paraskeva et al. "Resource-Constrained Neural Architecture Search on Language Models: A Case Study." ICML 2024 Workshops: WANT, 2024.](https://mlanthology.org/icmlw/2024/paraskeva2024icmlw-resourceconstrained/)

BibTeX

@inproceedings{paraskeva2024icmlw-resourceconstrained,
  title     = {{Resource-Constrained Neural Architecture Search on Language Models: A Case Study}},
  author    = {Paraskeva, Andreas and Reis, Joao Pedro and Verberne, Suzan and van Rijn, Jan N.},
  booktitle = {ICML 2024 Workshops: WANT},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/paraskeva2024icmlw-resourceconstrained/}
}