Learning the Transformer Kernel

Abstract

In this work we introduce KL-TRANSFORMER, a generic, scalable, data driven framework for learning the kernel function in Transformers. Our framework approximates the Transformer kernel as a dot product between spectral feature maps and learns the kernel by learning the spectral distribution. This not only helps in learning a generic kernel end-to-end, but also reduces the time and space complexity of Transformers from quadratic to linear. We show that KL-TRANSFORMERs achieve performance comparable to existing efficient Transformer architectures, both in terms of accuracy and computational efficiency. Our study also demonstrates that the choice of the kernel has a substantial impact on performance, and kernel learning variants are competitive alternatives to fixed kernel Transformers, both in long as well as short sequence tasks.

Cite

Text

Chowdhury et al. "Learning the Transformer Kernel." Transactions on Machine Learning Research, 2022.

Markdown

[Chowdhury et al. "Learning the Transformer Kernel." Transactions on Machine Learning Research, 2022.](https://mlanthology.org/tmlr/2022/chowdhury2022tmlr-learning/)

BibTeX

@article{chowdhury2022tmlr-learning,
  title     = {{Learning the Transformer Kernel}},
  author    = {Chowdhury, Sankalan Pal and Solomou, Adamos and Dubey, Kumar Avinava and Sachan, Mrinmaya},
  journal   = {Transactions on Machine Learning Research},
  year      = {2022},
  url       = {https://mlanthology.org/tmlr/2022/chowdhury2022tmlr-learning/}
}