Communication Efficient Coresets for Empirical Loss Minimization

Abstract

In this paper, we study the problem of empirical loss minimization with l2-regularization in distributed settings with significant communication cost. Stochastic gradient descent (SGD) and its variants are popular techniques for solving these problems in large-scale applications. However, the communication cost of these techniques is usually high, thus leading to considerable performance degradation. We introduce a novel approach to reduce the communication cost while retaining good convergence properties. The key to our approach is the construction of a small summary of the data, called coreset, at each iteration and solve an easy optimization problem based on the coreset. We present a general framework for analyzing coreset-based optimization and provide interesting insights into existing algorithms from this perspective. We then propose a new coreset construction and provide its convergence analysis for a wide class of problems that include logistic regression and support vector machines. We demonstrate the performance of our algorithm on real-world datasets and compare it against state-of-the-art algorithms.

Cite

Text

Reddi et al. "Communication Efficient Coresets for Empirical Loss Minimization." Conference on Uncertainty in Artificial Intelligence, 2015.

Markdown

[Reddi et al. "Communication Efficient Coresets for Empirical Loss Minimization." Conference on Uncertainty in Artificial Intelligence, 2015.](https://mlanthology.org/uai/2015/reddi2015uai-communication/)

BibTeX

@inproceedings{reddi2015uai-communication,
  title     = {{Communication Efficient Coresets for Empirical Loss Minimization}},
  author    = {Reddi, Sashank J. and Póczos, Barnabás and Smola, Alexander J.},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2015},
  pages     = {752-761},
  url       = {https://mlanthology.org/uai/2015/reddi2015uai-communication/}
}