ReLU MLPs Can Compute Numerical Integration: Mechanistic Interpretation of a Non-Linear Activation

Abstract

Extending the analysis from Nanda et al. (2023) and Zhong et al. (2023), we offer an end-to-end interpretation of the 1 layer MLP-only modular addition transformer model with symmetric embeds. We present a clear and mathematically rigorous description of the computation at each layer, in preparation for the proofs-based verification approach as set out in concurrent work under review. In doing so, we present a new interpretation of MLP layers: that they implement quadrature schemes to carry out numerical integration, providing anecdotal and mathematical evidence in support. This overturns the existing idea that neurons in neural networks are merely on-off switches that test for the presence of ``features'' -- instead multiple neurons can be combined in non-trivial ways to produce continuous quantities.

Cite

Text

Yip et al. "ReLU MLPs Can Compute Numerical Integration: Mechanistic Interpretation of a Non-Linear Activation." ICML 2024 Workshops: MI, 2024.

Markdown

[Yip et al. "ReLU MLPs Can Compute Numerical Integration: Mechanistic Interpretation of a Non-Linear Activation." ICML 2024 Workshops: MI, 2024.](https://mlanthology.org/icmlw/2024/yip2024icmlw-relu/)

BibTeX

@inproceedings{yip2024icmlw-relu,
  title     = {{ReLU MLPs Can Compute Numerical Integration: Mechanistic Interpretation of a Non-Linear Activation}},
  author    = {Yip, Chun Hei and Agrawal, Rajashree and Gross, Jason},
  booktitle = {ICML 2024 Workshops: MI},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/yip2024icmlw-relu/}
}