Distributed Pareto Optimization for Subset Selection

Abstract

The subset selection problem that selects a few items from a ground set arises in many applications such as maximum coverage, influence maximization, sparse regression, etc. The recently proposed POSS algorithm is a powerful approximation solver for this problem. However, POSS requires centralized access to the full ground set, and thus is impractical for large-scale real-world applications, where the ground set is too large to be stored on one single machine. In this paper, we propose a distributed version of POSS (DPOSS) with a bounded approximation guarantee. DPOSS can be easily implemented in the MapReduce framework. Our extensive experiments using Spark, on various real-world data sets with size ranging from thousands to millions, show that DPOSS can achieve competitive performance compared with the centralized POSS, and is almost always better than the state-of-the-art distributed greedy algorithm RandGreeDi.

Cite

Text

Qian et al. "Distributed Pareto Optimization for Subset Selection." International Joint Conference on Artificial Intelligence, 2018. doi:10.24963/IJCAI.2018/207

Markdown

[Qian et al. "Distributed Pareto Optimization for Subset Selection." International Joint Conference on Artificial Intelligence, 2018.](https://mlanthology.org/ijcai/2018/qian2018ijcai-distributed/) doi:10.24963/IJCAI.2018/207

BibTeX

@inproceedings{qian2018ijcai-distributed,
  title     = {{Distributed Pareto Optimization for Subset Selection}},
  author    = {Qian, Chao and Li, Guiying and Feng, Chao and Tang, Ke},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {1492-1498},
  doi       = {10.24963/IJCAI.2018/207},
  url       = {https://mlanthology.org/ijcai/2018/qian2018ijcai-distributed/}
}