Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams
Abstract
Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive over time at a fast pace and thus we need to design an efficient, low-memory algorithm. One such method, proposed by Badanidiyuru et al. (2014), always finds a 0.5-approximate solution. Can this approximation factor be improved? We answer this question affirmatively by designing a new algorithm Salsa for streaming submodular maximization. It is the first low-memory, singlepass algorithm that improves the factor 0.5, under the natural assumption that elements arrive in a random order. We also show that this assumption is necessary, i.e., that there is no such algorithm with better than 0.5-approximation when elements arrive in arbitrary order. Our experiments demonstrate that Salsa significantly outperforms the state of the art in applications related to exemplar-based clustering, social graph analysis, and recommender systems.
Cite
Text
Norouzi-Fard et al. "Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams." International Conference on Machine Learning, 2018.Markdown
[Norouzi-Fard et al. "Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams." International Conference on Machine Learning, 2018.](https://mlanthology.org/icml/2018/norouzifard2018icml-beyond/)BibTeX
@inproceedings{norouzifard2018icml-beyond,
title = {{Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams}},
author = {Norouzi-Fard, Ashkan and Tarnawski, Jakub and Mitrovic, Slobodan and Zandieh, Amir and Mousavifar, Aidasadat and Svensson, Ola},
booktitle = {International Conference on Machine Learning},
year = {2018},
pages = {3829-3838},
volume = {80},
url = {https://mlanthology.org/icml/2018/norouzifard2018icml-beyond/}
}