Very Fast Streaming Submodular Function Maximization
Abstract
Data summarization has become a valuable tool in understanding even terabytes of data. Due to their compelling theoretical properties, submodular functions have been in the focus of summarization algorithms. These algorithms offer worst-case approximations guarantees to the expense of higher computation and memory requirements. However, many practical applications do not fall under this worst-case, but are usually much more well-behaved. In this paper, we propose a new submodular function maximization algorithm called ThreeSieves, which ignores the worst-case, but delivers a good solution in high probability. It selects the most informative items from a data-stream on the fly and maintains a provable performance on a fixed memory budget. In an extensive evaluation, we compare our method against $6$ other methods on $8$ different datasets with and without concept drift. We show that our algorithm outperforms current state-of-the-art algorithms and, at the same time, uses fewer resources. Last, we highlight a real-world use-case of our algorithm for data summarization in gamma-ray astronomy. We make our code publicly available at this https URL.
Cite
Text
Buschjäger et al. "Very Fast Streaming Submodular Function Maximization." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. doi:10.1007/978-3-030-86523-8_10Markdown
[Buschjäger et al. "Very Fast Streaming Submodular Function Maximization." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021.](https://mlanthology.org/ecmlpkdd/2021/buschjager2021ecmlpkdd-very/) doi:10.1007/978-3-030-86523-8_10BibTeX
@inproceedings{buschjager2021ecmlpkdd-very,
title = {{Very Fast Streaming Submodular Function Maximization}},
author = {Buschjäger, Sebastian and Honysz, Philipp-Jan and Pfahler, Lukas and Morik, Katharina},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2021},
pages = {151-166},
doi = {10.1007/978-3-030-86523-8_10},
url = {https://mlanthology.org/ecmlpkdd/2021/buschjager2021ecmlpkdd-very/}
}