DYAD: A Descriptive yet Abjuring Density Efficient Approximation to Linear Neural Network Layers
Abstract
We devise, implement and performance-asses DYAD, a layer which can serve as a faster and more memory-efficient approximate replacement for linear layers, (nn.Linear() in Pytorch). These layers appear in common subcomponents, such as in the ff module of Transformers. DYAD is based on a bespoke near-sparse matrix structure which approximates the dense "weight" matrix W that matrix-multiplies the input in the typical realization of such a layer, a.k.a DENSE. Our alternative near-sparse matrix structure is decomposable to a sum of 2 matrices permutable to a block-sparse counterpart. These can be represented as 3D tensors, which in unison allow a faster execution of matrix multiplication with the mini-batched input matrix compared to DENSE (O(rows(W) × cols(W)) → O(rows(W)×cols(W)/ (# of blocks )). As the crux of our experiments, we pretrain both DYAD and DENSE variants of 2 sizes of the OPT arch and 1 size of the Pythia arch, including at different token scales of the babyLM benchmark. We find DYAD to be competitive (≥ 90%) of DENSE performance on zero-shot (e.g. BLIMP), few-shot (OPENLM) and finetuning (GLUE) benchmarks, while being ≥7-15% faster to train on-GPU even at 125m scale, besides surfacing larger speedups at increasing scale and model width.
Cite
Text
Chandy et al. "DYAD: A Descriptive yet Abjuring Density Efficient Approximation to Linear Neural Network Layers." NeurIPS 2023 Workshops: WANT, 2023.Markdown
[Chandy et al. "DYAD: A Descriptive yet Abjuring Density Efficient Approximation to Linear Neural Network Layers." NeurIPS 2023 Workshops: WANT, 2023.](https://mlanthology.org/neuripsw/2023/chandy2023neuripsw-dyad/)BibTeX
@inproceedings{chandy2023neuripsw-dyad,
title = {{DYAD: A Descriptive yet Abjuring Density Efficient Approximation to Linear Neural Network Layers}},
author = {Chandy, Sarin Eapen and Gangal, Varun Prashant and Yang, Yi and Maggiotti, Gabriel},
booktitle = {NeurIPS 2023 Workshops: WANT},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/chandy2023neuripsw-dyad/}
}