Improving Hyperparameter Optimization with Checkpointed Model Weights
Abstract
As the scale of foundation models continues to grow, efficient hyperparameter optimization (HPO) becomes increasingly critical to manage the substantial computational resources required for training and downstream usage. Traditional HPO methods are often prohibitively expensive in these scenarios, motivating the need for more sophisticated approaches. Classical methods treat this as a black-box optimization problem. However, gray-box HPO methods, which incorporate more information about the setup, have emerged as a promising direction for more efficient optimization. In this work, we propose an HPO method for neural networks using logged checkpoints of trained weights to guide future hyperparameter selections. Our method, Forecasting Model Search (FMS), embeds weights into a Gaussian process deep kernel surrogate model, using a permutation-invariant graph metanetwork to be data-efficient with logged network weights. We open-source our code ( https://github.com/NVlabs/forecasting-model-search ).
Cite
Text
Mehta et al. "Improving Hyperparameter Optimization with Checkpointed Model Weights." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-91979-4_8Markdown
[Mehta et al. "Improving Hyperparameter Optimization with Checkpointed Model Weights." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/mehta2024eccvw-improving/) doi:10.1007/978-3-031-91979-4_8BibTeX
@inproceedings{mehta2024eccvw-improving,
title = {{Improving Hyperparameter Optimization with Checkpointed Model Weights}},
author = {Mehta, Nikhil and Lorraine, Jonathan and Masson, Steve and Arunachalam, Ramanathan and Bhat, Zaid Pervaiz and Lucas, James and Zachariah, Arun George},
booktitle = {European Conference on Computer Vision Workshops},
year = {2024},
pages = {75-96},
doi = {10.1007/978-3-031-91979-4_8},
url = {https://mlanthology.org/eccvw/2024/mehta2024eccvw-improving/}
}