Insight Miner: A Time Series Analysis Dataset for Cross-Domain Alignment with Natural Language

Abstract

Time-series data is essential in various science and industry domains, like environmental analysis, agriculture, transportation, and finance. Researchers need to use their domain knowledge to conduct insight mining from time-series data to study scientific topics. However, this process is time-consuming and highly depends on expert knowledge. This paper proposes a large-scale multimodal model (LMM), Insight Miner, to generate decent and comprehensive time-series descriptions with domain-specific knowledge. To introduce rich time-series insights to Insight Miner, we propose a time-series analysis dataset, TS-Insights, composed of time series and textual insight pairs. In the TS-Insights dataset, we include 100k time series windows sampled from 20 forecasting datasets spanning a wide variety of domains and granularities. Through a meticulous combination of heuristics and statistical tools, we preprocess each raw time series window and use GPT-4 to generate a coherent trend description based on the extracted features. After training with the TS-Insights dataset via instruct tuning, the Insight Miner model performs better in generating time series descriptions and insights compared with state-of-the-art multimodality models, such as LLaVA and GPT-4. Our findings suggest a promising direction of leveraging LMMs for time series analysis and potentially offering avenues for efficient insight mining in scientific domains. The TS-Insights dataset is available here: https://drive.google.com/drive/folders/1qGXigxE5GvmF1oLuGXaqLMkRgwoQfZ7V?usp=sharing.

Cite

Text

Zhang et al. "Insight Miner: A Time Series Analysis Dataset for Cross-Domain Alignment with Natural Language." NeurIPS 2023 Workshops: AI4Science, 2023.

Markdown

[Zhang et al. "Insight Miner: A Time Series Analysis Dataset for Cross-Domain Alignment with Natural Language." NeurIPS 2023 Workshops: AI4Science, 2023.](https://mlanthology.org/neuripsw/2023/zhang2023neuripsw-insight/)

BibTeX

@inproceedings{zhang2023neuripsw-insight,
  title     = {{Insight Miner: A Time Series Analysis Dataset for Cross-Domain Alignment with Natural Language}},
  author    = {Zhang, Yunkai and Zhang, Yawen and Zheng, Ming and Chen, Kezhen and Gao, Chongyang and Ge, Ruian and Teng, Siyuan and Jelloul, Amine and Rao, Jinmeng and Guo, Xiaoyuan and Fang, Chiang-Wei and Zheng, Zeyu and Yang, Jie},
  booktitle = {NeurIPS 2023 Workshops: AI4Science},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/zhang2023neuripsw-insight/}
}