Study of Training Dynamics for Memory-Constrained Fine-Tuning

Abstract

Memory-efficient training of deep neural networks has become increasingly important as models grow larger while deployment environments impose strict resource constraints. We propose TraDy, a novel transfer learning scheme leveraging two key insights: layer importance for updates is architecture-dependent and determinable a priori, while dynamic stochastic channel selection provides superior gradient approximation compared to static approaches. We introduce a dynamic channel selection approach that stochastically resamples channels between epochs within preselected layers. Extensive experiments demonstrate TraDy achieves state-of-the-art performance across various downstream tasks and architectures while maintaining strict memory constraints, achieving up to 99\% activation sparsity, 95\% weight derivative sparsity, and 97\% reduction in FLOPs for weight derivative computation.

Cite

Text

Quélennec et al. "Study of Training Dynamics for Memory-Constrained Fine-Tuning." International Conference on Learning Representations, 2026.

Markdown

[Quélennec et al. "Study of Training Dynamics for Memory-Constrained Fine-Tuning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/quelennec2026iclr-study/)

BibTeX

@inproceedings{quelennec2026iclr-study,
  title     = {{Study of Training Dynamics for Memory-Constrained Fine-Tuning}},
  author    = {Quélennec, Aël and Hezbri, Nour and Mozharovskyi, Pavlo and Nguyen, Van-Tam and Tartaglione, Enzo},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/quelennec2026iclr-study/}
}