MoReDrop: Dropout Without Dropping
Abstract
Dropout is a widely adopted technique that significantly improves the generalization of deep neural networks in various domains. However, the discrepancy in model configurations between the training and evaluation phases introduces a significant challenge: the model distributional shift. In this study, we introduce an innovative approach termed Model Regularization for Dropout (MoReDrop). MoReDrop actively updates solely the dense model during training, targeting its loss function optimization and thus eliminating the primary source of distributional shift. To further leverage the benefits of dropout, we introduce a regularizer derived from the output divergence of the dense and its dropout models. Importantly, sub-models receive passive updates owing to their shared attributes with the dense model. To reduce computational demands, we introduce a streamlined variant of MoReDrop, referred to as MoReDropL, which utilizes dropout exclusively in the final layer. Our experiments, conducted on several benchmarks across multiple domains, consistently demonstrate the scalability, efficiency, and robustness of our proposed algorithms.
Cite
Text
Jiang et al. "MoReDrop: Dropout Without Dropping." ICML 2024 Workshops: WANT, 2024.Markdown
[Jiang et al. "MoReDrop: Dropout Without Dropping." ICML 2024 Workshops: WANT, 2024.](https://mlanthology.org/icmlw/2024/jiang2024icmlw-moredrop/)BibTeX
@inproceedings{jiang2024icmlw-moredrop,
title = {{MoReDrop: Dropout Without Dropping}},
author = {Jiang, Li and Li, Duo and Ding, Yichuan and Liu, Xue and Chan, Victor Wai Kin},
booktitle = {ICML 2024 Workshops: WANT},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/jiang2024icmlw-moredrop/}
}