Scaling-Laws for Large Time-Series Models
Abstract
Scaling laws for large language models (LLMs) have provided useful guidance in training ever larger models for predictable performance gains. Time series forecasting shares a similar sequential structure to language, and is amenable to large-scale transformer architectures. Here we show that foundational decoder-only time series transformer models exhibit analogous scaling-behavior to LLMs, with architectural details (aspect ratio and number of heads) having a minimal effect over broad ranges. We assemble a large corpus of heterogenous time series data on which to train, and establish for the first time power-law scaling with parameter count, dataset size, and training compute, spanning five orders of magnitude.
Cite
Text
Alsing et al. "Scaling-Laws for Large Time-Series Models." NeurIPS 2024 Workshops: TSALM, 2024.Markdown
[Alsing et al. "Scaling-Laws for Large Time-Series Models." NeurIPS 2024 Workshops: TSALM, 2024.](https://mlanthology.org/neuripsw/2024/alsing2024neuripsw-scalinglaws/)BibTeX
@inproceedings{alsing2024neuripsw-scalinglaws,
title = {{Scaling-Laws for Large Time-Series Models}},
author = {Alsing, Justin and Edwards, Thomas and Wandelt, Benjamin Dan and Alvey, James and Nguyen, Nam H},
booktitle = {NeurIPS 2024 Workshops: TSALM},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/alsing2024neuripsw-scalinglaws/}
}