On Convergence and Generalization of Dropout Training
Abstract
We study dropout in two-layer neural networks with rectified linear unit (ReLU) activations. Under mild overparametrization and assuming that the limiting kernel can separate the data distribution with a positive margin, we show that the dropout training with logistic loss achieves $\epsilon$-suboptimality in the test error in $O(1/\epsilon)$ iterations.
Cite
Text
Mianjy and Arora. "On Convergence and Generalization of Dropout Training." Neural Information Processing Systems, 2020.Markdown
[Mianjy and Arora. "On Convergence and Generalization of Dropout Training." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/mianjy2020neurips-convergence/)BibTeX
@inproceedings{mianjy2020neurips-convergence,
title = {{On Convergence and Generalization of Dropout Training}},
author = {Mianjy, Poorya and Arora, Raman},
booktitle = {Neural Information Processing Systems},
year = {2020},
url = {https://mlanthology.org/neurips/2020/mianjy2020neurips-convergence/}
}