Advancing Dynamic Sparse Training by Exploring Optimization Opportunities

Abstract

Dynamic Sparse Training (DST) is an effective approach for addressing the substantial training resource requirements posed by the ever-increasing size of the Deep Neural Networks (DNNs). Characterized by its dynamic "train-prune-grow” schedule during training, DST implicitly develops a bi-level structure for training the weights while discovering a subnetwork topology. However, such a structure is consistently overlooked by the current DST algorithms for further optimization opportunities, and these algorithms, on the other hand, solely optimize the weights while determining masks heuristically. In this paper, we extensively study DST algorithms and argue that the training scheme of DST naturally forms a bi-level problem in which the updating of weight and mask is interdependent. Based on this observation, we introduce a novel efficient training framework called BiDST, which for the first time, introduces bi-level optimization methodology into dynamic sparse training domain. Unlike traditional partial-heuristic DST schemes, which suffer from sub-optimal search efficiency for masks and miss the opportunity to fully explore the topological space of neural networks, BiDST excels at discovering excellent sparse patterns by optimizing mask and weight simultaneously, resulting in maximum 2.62% higher accuracy, 2.1$\times$ faster execution speed, and 25$\times$ reduced overhead. Code available at https://github.com/jjsrf/BiDST-ICML2024.

Cite

Text

Ji et al. "Advancing Dynamic Sparse Training by Exploring Optimization Opportunities." International Conference on Machine Learning, 2024.

Markdown

[Ji et al. "Advancing Dynamic Sparse Training by Exploring Optimization Opportunities." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/ji2024icml-advancing/)

BibTeX

@inproceedings{ji2024icml-advancing,
  title     = {{Advancing Dynamic Sparse Training by Exploring Optimization Opportunities}},
  author    = {Ji, Jie and Li, Gen and Yin, Lu and Qin, Minghai and Yuan, Geng and Guo, Linke and Liu, Shiwei and Ma, Xiaolong},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {21606-21619},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/ji2024icml-advancing/}
}