Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and More

Tolochinksy, Elad; Jubran, Ibrahim; Feldman, Dan

Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and More

Elad Tolochinksy, Ibrahim Jubran, Dan Feldman

ICML 2022 pp. 21520-21547

/icml/2022/tolochinksy2022icml-generic/

Abstract

Coreset (or core-set) is a small weighted subset $Q$ of an input set $P$ with respect to a given monotonic function $f:\mathbb{R}\to\mathbb{R}$ that provably approximates its fitting loss $\sum_{p\in P}f(p\cdot x)$ to any given $x\in\mathbb{R}^d$. Using $Q$ we can obtain an approximation of $x^*$ that minimizes this loss, by running existing optimization algorithms on $Q$. In this work we provide: (i) A lower bound which proves that there are sets with no coresets smaller than $n=|P|$ for general monotonic loss functions. (ii) A proof that, with an additional common regularization term and under a natural assumption that holds e.g. for logistic regression and the sigmoid activation functions, a small coreset exists for any input $P$. (iii) A generic coreset construction algorithm that computes such a small coreset $Q$ in $O(nd+n\log n)$ time, and (iv) Experimental results with open-source code which demonstrate that our coresets are effective and are much smaller in practice than predicted in theory.

PDF ICML Semantic Scholar

Cite

Text

Tolochinksy et al. "Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and More." International Conference on Machine Learning, 2022.

Markdown

[Tolochinksy et al. "Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and More." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/tolochinksy2022icml-generic/)

BibTeX

@inproceedings{tolochinksy2022icml-generic,
  title     = {{Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and More}},
  author    = {Tolochinksy, Elad and Jubran, Ibrahim and Feldman, Dan},
  booktitle = {International Conference on Machine Learning},
  year      = {2022},
  pages     = {21520-21547},
  volume    = {162},
  url       = {https://mlanthology.org/icml/2022/tolochinksy2022icml-generic/}
}