mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale

Abstract

Anomaly detection in multivariate time series is essential across domains such as healthcare, cybersecurity, and industrial monitoring, yet remains fundamentally challenging due to high-dimensional dependencies, the presence of cross-correlations between time-dependent variables, and the scarcity of labeled anomalies. We introduce mTSBench, the largest benchmark to date for multivariate time series anomaly detection and model selection, consisting of 344 labeled time series across 19 datasets from a wide range of application domains. We comprehensively evaluate 24 anomaly detectors, including the only two publicly available large language model-based methods for multivariate time series. Consistent with prior findings, we observe that no single detector dominates across datasets, motivating the need for effective model selection. We benchmark three recent model selection methods and find that even the strongest of them remain far from optimal. Our results highlight the outstanding need for robust, generalizable selection strategies. We open-source the benchmark at \url{https://plan-lab.github.io/mtsbench} to encourage future research.

Cite

Text

Zhou et al. "mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale." Transactions on Machine Learning Research, 2026.

Markdown

[Zhou et al. "mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/zhou2026tmlr-mtsbench/)

BibTeX

@article{zhou2026tmlr-mtsbench,
  title     = {{mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale}},
  author    = {Zhou, Xiaona and Brif, Constantin and Lourentzou, Ismini},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/zhou2026tmlr-mtsbench/}
}