Scaling-Laws for Large Time-Series Models

Abstract

Scaling laws for large language models (LLMs) have provided useful guidance in training ever larger models for predictable performance gains. Time series forecasting shares a similar sequential structure to language, and is amenable to large-scale transformer architectures. Here we show that foundational decoder-only time series transformer models exhibit analogous scaling-behavior to LLMs, with architectural details (aspect ratio and number of heads) having a minimal effect over broad ranges. We assemble a large corpus of heterogenous time series data on which to train, and establish for the first time power-law scaling with parameter count, dataset size, and training compute, spanning five orders of magnitude.

Cite

Text

Alsing et al. "Scaling-Laws for Large Time-Series Models." NeurIPS 2024 Workshops: TSALM, 2024.

Markdown

[Alsing et al. "Scaling-Laws for Large Time-Series Models." NeurIPS 2024 Workshops: TSALM, 2024.](https://mlanthology.org/neuripsw/2024/alsing2024neuripsw-scalinglaws/)

BibTeX

@inproceedings{alsing2024neuripsw-scalinglaws,
  title     = {{Scaling-Laws for Large Time-Series Models}},
  author    = {Alsing, Justin and Edwards, Thomas and Wandelt, Benjamin Dan and Alvey, James and Nguyen, Nam H},
  booktitle = {NeurIPS 2024 Workshops: TSALM},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/alsing2024neuripsw-scalinglaws/}
}