Spred: Solving L1 Penalty with SGD

Abstract

We propose to minimize a generic differentiable objective with $L_1$ constraint using a simple reparametrization and straightforward stochastic gradient descent. Our proposal is the direct generalization of previous ideas that the $L_1$ penalty may be equivalent to a differentiable reparametrization with weight decay. We prove that the proposed method, spred, is an exact differentiable solver of $L_1$ and that the reparametrization trick is completely “benign" for a generic nonconvex function. Practically, we demonstrate the usefulness of the method in (1) training sparse neural networks to perform gene selection tasks, which involves finding relevant features in a very high dimensional space, and (2) neural network compression task, to which previous attempts at applying the $L_1$-penalty have been unsuccessful. Conceptually, our result bridges the gap between the sparsity in deep learning and conventional statistical learning.

Cite

Text

Ziyin and Wang. "Spred: Solving L1 Penalty with SGD." International Conference on Machine Learning, 2023.

Markdown

[Ziyin and Wang. "Spred: Solving L1 Penalty with SGD." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/ziyin2023icml-spred/)

BibTeX

@inproceedings{ziyin2023icml-spred,
  title     = {{Spred: Solving L1 Penalty with SGD}},
  author    = {Ziyin, Liu and Wang, Zihao},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {43407-43422},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/ziyin2023icml-spred/}
}