Mixture of Experts for Time Series Foundation Models
Abstract
Time series foundation models, such as MOIRAI, have shown exceptional zero-shot forecasting capabilities. However, they enable cross-frequency learning by employing multiple linear projection layers, each specialized for handling time series at a specific frequency. This design has two major limitations: (1) Time series data are imbalanced across frequencies, leading to insufficient training of parameters for underrepresented frequencies and diminishing the effectiveness of cross-frequency learning. (2) Specialization at the frequency level is coarse-grained. For instance, time series with similar patterns but different frequencies can produce undesirable, distinct embeddings. Additionally, time series data from the same frequency can exhibit various patterns and a linear layer lacks the capacity to handle such complexity. To address these issues holistically, this paper proposes MOIRAI-MOE, which uses a single projection layer and delegates the modeling of diverse time series patterns to the mixture of experts (MoE) within Transformers. By leveraging experts for token-level specialization, MOIRAI-MOE achieves superior unified learning capabilities and delivers significant improvements in both in-distribution and zero-shot assessments.
Cite
Text
Liu et al. "Mixture of Experts for Time Series Foundation Models." NeurIPS 2024 Workshops: TSALM, 2024.Markdown
[Liu et al. "Mixture of Experts for Time Series Foundation Models." NeurIPS 2024 Workshops: TSALM, 2024.](https://mlanthology.org/neuripsw/2024/liu2024neuripsw-mixture/)BibTeX
@inproceedings{liu2024neuripsw-mixture,
title = {{Mixture of Experts for Time Series Foundation Models}},
author = {Liu, Xu and Liu, Juncheng and Woo, Gerald and Aksu, Taha and Liu, Chenghao and Savarese, Silvio and Xiong, Caiming and Sahoo, Doyen},
booktitle = {NeurIPS 2024 Workshops: TSALM},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/liu2024neuripsw-mixture/}
}