Reservoir Pattern Sampling in Data Streams
Abstract
Many applications generate data streams where online analysis needs are essential. In this context, pattern mining is a complex task because it requires access to all data observations. To overcome this problem, the state-of-the-art methods maintain a data sample or a compact data structure retaining only recent information on the main patterns. This paper addresses online pattern discovery in data streams based on pattern sampling techniques. Benefiting from reservoir sampling, we propose a generic algorithm, named , that uses a limited memory space and that integrates a wide spectrum of temporal biases simulating landmark window, sliding window or exponential damped window. For these three window models, we provide fast damping optimizations and we study their temporal complexity. Experiments show that the performance of algorithms is particularly good. Finally, we illustrate the interest of our approach with online outlier detection in data streams.
Cite
Text
Giacometti and Soulet. "Reservoir Pattern Sampling in Data Streams." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. doi:10.1007/978-3-030-86486-6_21Markdown
[Giacometti and Soulet. "Reservoir Pattern Sampling in Data Streams." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021.](https://mlanthology.org/ecmlpkdd/2021/giacometti2021ecmlpkdd-reservoir/) doi:10.1007/978-3-030-86486-6_21BibTeX
@inproceedings{giacometti2021ecmlpkdd-reservoir,
title = {{Reservoir Pattern Sampling in Data Streams}},
author = {Giacometti, Arnaud and Soulet, Arnaud},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2021},
pages = {337-352},
doi = {10.1007/978-3-030-86486-6_21},
url = {https://mlanthology.org/ecmlpkdd/2021/giacometti2021ecmlpkdd-reservoir/}
}