Size Matters: Finding the Most Informative Set of Window Lengths

Abstract

Event sequences often contain continuous variability at different levels. In other words, their properties and characteristics change at different rates, concurrently. For example, the sales of a product may slowly become more frequent over a period of several weeks, but there may be interesting variation within a week at the same time. To provide an accurate and robust “view” of such multi-level structural behavior, one needs to determine the appropriate levels of granularity for analyzing the underlying sequence. We introduce the novel problem of finding the best set of window lengths for analyzing discrete event sequences. We define suitable criteria for choosing window lengths and propose an efficient method to solve the problem. We give examples of tasks that demonstrate the applicability of the problem and present extensive experiments on both synthetic data and real data from two domains: text and DNA. We find that the optimal sets of window lengths themselves can provide new insight into the data, e.g., the burstiness of events affects the optimal window lengths for measuring the event frequencies.

Cite

Text

Lijffijt et al. "Size Matters: Finding the Most Informative Set of Window Lengths." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2012. doi:10.1007/978-3-642-33486-3_29

Markdown

[Lijffijt et al. "Size Matters: Finding the Most Informative Set of Window Lengths." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2012.](https://mlanthology.org/ecmlpkdd/2012/lijffijt2012ecmlpkdd-size/) doi:10.1007/978-3-642-33486-3_29

BibTeX

@inproceedings{lijffijt2012ecmlpkdd-size,
  title     = {{Size Matters: Finding the Most Informative Set of Window Lengths}},
  author    = {Lijffijt, Jefrey and Papapetrou, Panagiotis and Puolamäki, Kai},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2012},
  pages     = {451-466},
  doi       = {10.1007/978-3-642-33486-3_29},
  url       = {https://mlanthology.org/ecmlpkdd/2012/lijffijt2012ecmlpkdd-size/}
}