DASH: Warm-Starting Neural Network Training Without Loss of Plasticity Under Stationarity

Abstract

Warm-starting neural networks by initializing them with previously learned weights is appealing, as practical neural networks are often deployed under a continuous influx of new data. However, it often leads to *loss of plasticity*, where the network loses its ability to learn new information, resulting in worse generalization compared to training from scratch. This occurs even under stationary data distributions, and its underlying mechanism is poorly understood. We develop a framework emulating real-world neural network training and identify noise memorization as the primary cause of plasticity loss when warm-starting on stationary data. Motivated by this, we propose **Direction-Aware SHrinking (DASH)**, a method aiming to mitigate plasticity loss by selectively forgetting memorized noise while preserving learned features. We validate our approach on vision tasks, demonstrating improvements in test accuracy and training efficiency.

Cite

Text

Shin et al. "DASH: Warm-Starting Neural Network Training Without Loss of Plasticity Under Stationarity." ICML 2024 Workshops: WANT, 2024.

Markdown

[Shin et al. "DASH: Warm-Starting Neural Network Training Without Loss of Plasticity Under Stationarity." ICML 2024 Workshops: WANT, 2024.](https://mlanthology.org/icmlw/2024/shin2024icmlw-dash/)

BibTeX

@inproceedings{shin2024icmlw-dash,
  title     = {{DASH: Warm-Starting Neural Network Training Without Loss of Plasticity Under Stationarity}},
  author    = {Shin, Baekrok and Oh, Junsoo and Cho, Hanseul and Yun, Chulhee},
  booktitle = {ICML 2024 Workshops: WANT},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/shin2024icmlw-dash/}
}