Better Generalization with Less Data Using Robust Gradient Descent
Abstract
For learning tasks where the data (or losses) may be heavy-tailed, algorithms based on empirical risk minimization may require a substantial number of observations in order to perform well off-sample. In pursuit of stronger performance under weaker assumptions, we propose a technique which uses a cheap and robust iterative estimate of the risk gradient, which can be easily fed into any steepest descent procedure. Finite-sample risk bounds are provided under weak moment assumptions on the loss gradient. The algorithm is simple to implement, and empirical tests using simulations and real-world data illustrate that more efficient and reliable learning is possible without prior knowledge of the loss tails.
Cite
Text
Holland and Ikeda. "Better Generalization with Less Data Using Robust Gradient Descent." International Conference on Machine Learning, 2019.Markdown
[Holland and Ikeda. "Better Generalization with Less Data Using Robust Gradient Descent." International Conference on Machine Learning, 2019.](https://mlanthology.org/icml/2019/holland2019icml-better/)BibTeX
@inproceedings{holland2019icml-better,
title = {{Better Generalization with Less Data Using Robust Gradient Descent}},
author = {Holland, Matthew and Ikeda, Kazushi},
booktitle = {International Conference on Machine Learning},
year = {2019},
pages = {2761-2770},
volume = {97},
url = {https://mlanthology.org/icml/2019/holland2019icml-better/}
}