ReLU MLPs Can Compute Numerical Integration: Mechanistic Interpretation of a Non-Linear Activation
Abstract
Extending the analysis from Nanda et al. (2023) and Zhong et al. (2023), we offer an end-to-end interpretation of the 1 layer MLP-only modular addition transformer model with symmetric embeds. We present a clear and mathematically rigorous description of the computation at each layer, in preparation for the proofs-based verification approach as set out in concurrent work under review. In doing so, we present a new interpretation of MLP layers: that they implement quadrature schemes to carry out numerical integration, providing anecdotal and mathematical evidence in support. This overturns the existing idea that neurons in neural networks are merely on-off switches that test for the presence of ``features'' -- instead multiple neurons can be combined in non-trivial ways to produce continuous quantities.
Cite
Text
Yip et al. "ReLU MLPs Can Compute Numerical Integration: Mechanistic Interpretation of a Non-Linear Activation." ICML 2024 Workshops: MI, 2024.Markdown
[Yip et al. "ReLU MLPs Can Compute Numerical Integration: Mechanistic Interpretation of a Non-Linear Activation." ICML 2024 Workshops: MI, 2024.](https://mlanthology.org/icmlw/2024/yip2024icmlw-relu/)BibTeX
@inproceedings{yip2024icmlw-relu,
title = {{ReLU MLPs Can Compute Numerical Integration: Mechanistic Interpretation of a Non-Linear Activation}},
author = {Yip, Chun Hei and Agrawal, Rajashree and Gross, Jason},
booktitle = {ICML 2024 Workshops: MI},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/yip2024icmlw-relu/}
}