Sub-Sampled Newton Methods with Non-Uniform Sampling

Abstract

We consider the problem of finding the minimizer of a convex function $F: \mathbb R^d \rightarrow \mathbb R$ of the form $F(w) \defeq \sum_{i=1}^n f_i(w) + R(w)$ where a low-rank factorization of $\nabla^2 f_i(w)$ is readily available.We consider the regime where $n \gg d$. We propose randomized Newton-type algorithms that exploit \textit{non-uniform} sub-sampling of $\{\nabla^2 f_i(w)\}_{i=1}^{n}$, as well as inexact updates, as means to reduce the computational complexity, and are applicable to a wide range of problems in machine learning. Two non-uniform sampling distributions based on {\it block norm squares} and {\it block partial leverage scores} are considered. Under certain assumptions, we show that our algorithms inherit a linear-quadratic convergence rate in $w$ and achieve a lower computational complexity compared to similar existing methods. In addition, we show that our algorithms exhibit more robustness and better dependence on problem specific quantities, such as the condition number. We numerically demonstrate the advantages of our algorithms on several real datasets.

Cite

Text

Xu et al. "Sub-Sampled Newton Methods with Non-Uniform Sampling." Neural Information Processing Systems, 2016.

Markdown

[Xu et al. "Sub-Sampled Newton Methods with Non-Uniform Sampling." Neural Information Processing Systems, 2016.](https://mlanthology.org/neurips/2016/xu2016neurips-subsampled/)

BibTeX

@inproceedings{xu2016neurips-subsampled,
  title     = {{Sub-Sampled Newton Methods with Non-Uniform Sampling}},
  author    = {Xu, Peng and Yang, Jiyan and Roosta, Fred and Ré, Christopher and Mahoney, Michael W.},
  booktitle = {Neural Information Processing Systems},
  year      = {2016},
  pages     = {3000-3008},
  url       = {https://mlanthology.org/neurips/2016/xu2016neurips-subsampled/}
}