Random Reshuffling Is Not Always Better

Abstract

Many learning algorithms, such as stochastic gradient descent, are affected by the order in which training examples are used. It is often observed that sampling the training examples without-replacement, also known as random reshuffling, causes learning algorithms to converge faster. We give a counterexample to the Operator Inequality of Noncommutative Arithmetic and Geometric Means, a longstanding conjecture that relates to the performance of random reshuffling in learning algorithms (Recht and Ré, "Toward a noncommutative arithmetic-geometric mean inequality: conjectures, case-studies, and consequences," COLT 2012). We use this to give an example of a learning task and algorithm for which with-replacement random sampling actually outperforms random reshuffling.

Cite

Text

De Sa. "Random Reshuffling Is Not Always Better." Neural Information Processing Systems, 2020.

Markdown

[De Sa. "Random Reshuffling Is Not Always Better." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/sa2020neurips-random/)

BibTeX

@inproceedings{sa2020neurips-random,
  title     = {{Random Reshuffling Is Not Always Better}},
  author    = {De Sa, Christopher M},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/sa2020neurips-random/}
}