CAPE: Generalized Convergence Prediction Across Architectures Without Full Training

Abstract

Training deep neural networks to convergence is expensive and time-consuming, especially when exploring new architectures or hardware configurations. Prior work has primarily estimated per-iteration or per-epoch cost under fixed training schedules, overlooking the critical challenge of predicting how long a model will take to converge. We present \textit{CAPE} (Convergence-Aware Prediction Engine), a lightweight and probing-based framework that predicts the number of epochs required for convergence before any full training occurs. CAPE performs a brief probe at initialization using a small batch of data to extract analytical and dynamical features such as parameter count, dataset size, learning rate, batch size, gradient norm, Neural Tangent Kernel (NTK) trace, and initial loss. These features jointly characterize the model’s optimization landscape and serve as input to a meta-model trained to forecast convergence horizons under a validation-based early-stopping criterion. CAPE achieves strong predictive correspondence to true convergence epochs, with a Pearson correlation of 0.89 across diverse architectures and datasets, demonstrating accurate and consistent convergence prediction across model families. By enabling zero-shot prediction of full-dataset convergence behavior, CAPE provides a practical tool for rapid model selection, hyperparameter exploration, and resource-aware training planning.

Cite

Text

Pourali et al. "CAPE: Generalized Convergence Prediction Across Architectures Without Full Training." Transactions on Machine Learning Research, 2026.

Markdown

[Pourali et al. "CAPE: Generalized Convergence Prediction Across Architectures Without Full Training." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/pourali2026tmlr-cape/)

BibTeX

@article{pourali2026tmlr-cape,
  title     = {{CAPE: Generalized Convergence Prediction Across Architectures Without Full Training}},
  author    = {Pourali, Alireza and Boukani, Arian and Khazaei, Hamzeh},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/pourali2026tmlr-cape/}
}