Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices

Abstract

Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. Previous efforts to develop alternatives have focused on a small number of hand-crafted structured matrices, and have neglected to investigate whether these structures can surpass dense layers in terms of compute-optimal scaling laws when both the model size and training examples are optimally allocated. In this work, we present a unifying framework that enables searching among all linear operators expressible via an Einstein summation. This framework encompasses many previously proposed structures, such as low-rank, Kronecker, Tensor-Train, and Monarch, along with many novel structures. We develop a taxonomy of all such operators based on their computational and algebraic properties, which provides insights into their scaling laws. Combining these insights with empirical evaluation, we identify a subset of structures that achieve equal or better performance than dense layers as a function of training compute. To further improve their compute efficiency, we develop a natural extension of these performant structures that convert them into a sparse Mixture-of-Experts layer. The resulting layer significantly outperforms dense layers in compute-optimal training efficiency for GPT-2 language models.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Potapczynski et al. "Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices." Neural Information Processing Systems, 2024. doi:10.52202/079017-0126

Markdown

[Potapczynski et al. "Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/potapczynski2024neurips-searching/) doi:10.52202/079017-0126

BibTeX

@inproceedings{potapczynski2024neurips-searching,
  title     = {{Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices}},
  author    = {Potapczynski, Andres and Qiu, Shikai and Finzi, Marc and Ferri, Christopher and Chen, Zixi and Goldblum, Micah and Bruss, C. Bayan and De Sa, Christopher and Wilson, Andrew Gordon},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0126},
  url       = {https://mlanthology.org/neurips/2024/potapczynski2024neurips-searching/}
}