Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints
Abstract
Can we efficiently extract useful information from a large user-generated dataset while protecting the privacy of the users and/or ensuring fairness in representation? We cast this problem as an instance of a deletion-robust submodular maximization where part of the data may be deleted or masked due to privacy concerns or fairness criteria. We propose the first memory-efficient centralized, streaming, and distributed methods with constant-factor approximation guarantees against any number of adversarial deletions. We extensively evaluate the performance of our algorithms on real-world applications, including (i) Uber-pick up locations with location privacy constraints; (ii) feature selection with fairness constraints for income prediction and crime rate prediction; and (iii) robust to deletion summarization of census data, consisting of 2,458,285 feature vectors. Our experiments show that our solution is robust against even $80%$ of data deletion.
Cite
Text
Kazemi et al. "Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints." International Conference on Machine Learning, 2018.Markdown
[Kazemi et al. "Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints." International Conference on Machine Learning, 2018.](https://mlanthology.org/icml/2018/kazemi2018icml-scalable/)BibTeX
@inproceedings{kazemi2018icml-scalable,
title = {{Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints}},
author = {Kazemi, Ehsan and Zadimoghaddam, Morteza and Karbasi, Amin},
booktitle = {International Conference on Machine Learning},
year = {2018},
pages = {2544-2553},
volume = {80},
url = {https://mlanthology.org/icml/2018/kazemi2018icml-scalable/}
}