Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?

Abstract

The recent development of foundation models for time series data has generated considerable interest in using such models across a variety of applications. Although foundation models achieve state-of-the-art predictive performance, their calibration properties remain relatively underexplored, despite the fact that calibration can be critical for many practical applications. In this paper, we investigate the calibration-related properties of five recent time series foundation models and two competitive baselines. We perform a series of systematic evaluations assessing model calibration (i.e., over- or under-confidence), effects of varying prediction heads, and calibration under long-term autoregressive forecasting. We find that time series foundation models are consistently better calibrated than baseline models and tend not to be either systematically over- or under-confident, in contrast to the overconfidence often seen in other deep learning models.

Cite

Text

Adler et al. "Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?." International Conference on Learning Representations, 2026.

Markdown

[Adler et al. "Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/adler2026iclr-beyond/)

BibTeX

@inproceedings{adler2026iclr-beyond,
  title     = {{Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?}},
  author    = {Adler, Coen and Chang, Yuxin and Abdi, Samar and Draxler, Felix and Smyth, Padhraic},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/adler2026iclr-beyond/}
}