On Convergence and Generalization of Dropout Training

Abstract

We study dropout in two-layer neural networks with rectified linear unit (ReLU) activations. Under mild overparametrization and assuming that the limiting kernel can separate the data distribution with a positive margin, we show that the dropout training with logistic loss achieves $\epsilon$-suboptimality in the test error in $O(1/\epsilon)$ iterations.

Cite

Text

Mianjy and Arora. "On Convergence and Generalization of Dropout Training." Neural Information Processing Systems, 2020.

Markdown

[Mianjy and Arora. "On Convergence and Generalization of Dropout Training." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/mianjy2020neurips-convergence/)

BibTeX

@inproceedings{mianjy2020neurips-convergence,
  title     = {{On Convergence and Generalization of Dropout Training}},
  author    = {Mianjy, Poorya and Arora, Raman},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/mianjy2020neurips-convergence/}
}