Very Fast Streaming Submodular Function Maximization

Abstract

Data summarization has become a valuable tool in understanding even terabytes of data. Due to their compelling theoretical properties, submodular functions have been in the focus of summarization algorithms. These algorithms offer worst-case approximations guarantees to the expense of higher computation and memory requirements. However, many practical applications do not fall under this worst-case, but are usually much more well-behaved. In this paper, we propose a new submodular function maximization algorithm called ThreeSieves, which ignores the worst-case, but delivers a good solution in high probability. It selects the most informative items from a data-stream on the fly and maintains a provable performance on a fixed memory budget. In an extensive evaluation, we compare our method against $6$ other methods on $8$ different datasets with and without concept drift. We show that our algorithm outperforms current state-of-the-art algorithms and, at the same time, uses fewer resources. Last, we highlight a real-world use-case of our algorithm for data summarization in gamma-ray astronomy. We make our code publicly available at this https URL.

Cite

Text

Buschjäger et al. "Very Fast Streaming Submodular Function Maximization." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. doi:10.1007/978-3-030-86523-8_10

Markdown

[Buschjäger et al. "Very Fast Streaming Submodular Function Maximization." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021.](https://mlanthology.org/ecmlpkdd/2021/buschjager2021ecmlpkdd-very/) doi:10.1007/978-3-030-86523-8_10

BibTeX

@inproceedings{buschjager2021ecmlpkdd-very,
  title     = {{Very Fast Streaming Submodular Function Maximization}},
  author    = {Buschjäger, Sebastian and Honysz, Philipp-Jan and Pfahler, Lukas and Morik, Katharina},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2021},
  pages     = {151-166},
  doi       = {10.1007/978-3-030-86523-8_10},
  url       = {https://mlanthology.org/ecmlpkdd/2021/buschjager2021ecmlpkdd-very/}
}