Mechanistic Interpretability of ReLU Neural Networks Through Piecewise-Affine Mapping

Abstract

Abstract Rectified linear unit (ReLU) based neural networks (NNs) are recognised for their remarkable accuracy. However, the decision-making processes of these networks are often complex and difficult to understand. This complexity can lead to challenges in error identification, establishing trust, and conducting thorough analyses. Existing methods often fail to provide clear insights into the actual computations occurring within each layer of these networks. To address this challenge, this study introduces a mechanistic interpretability method called ReLU Region Reasoning (Re3). This method uses the known piecewise-linear characteristics of ReLU networks to offer insights into neuron activation and accurately assess how each feature contributes to the final output and probability. Re3 effectively determines neuron activations and evaluates the contribution of each feature within a specified linear region. Experiments conducted on multiple benchmark datasets, including both tabular and image data, demonstrate that Re3 can replicate individual predictions without error, align feature importance with domain expertise, and maintain consistency with current explanatory methods, thereby avoiding the typical randomness. Analysing neurons reveals activation sparsity and identifies dominant units, thus providing clear targets for model simplification and troubleshooting. By ensuring transparency and algebraic accessibility in each stage of a ReLU-based NN’s decision process, Re3 can be a valuable practical tool for achieving precise mechanistic interpretability.

Cite

Text

Barua et al. "Mechanistic Interpretability of ReLU Neural Networks Through Piecewise-Affine Mapping." Machine Learning, 2026. doi:10.1007/S10994-025-06957-0

Markdown

[Barua et al. "Mechanistic Interpretability of ReLU Neural Networks Through Piecewise-Affine Mapping." Machine Learning, 2026.](https://mlanthology.org/mlj/2026/barua2026mlj-mechanistic/) doi:10.1007/S10994-025-06957-0

BibTeX

@article{barua2026mlj-mechanistic,
  title     = {{Mechanistic Interpretability of ReLU Neural Networks Through Piecewise-Affine Mapping}},
  author    = {Barua, Arnab and Ahmed, Mobyen Uddin and Begum, Shahina},
  journal   = {Machine Learning},
  year      = {2026},
  pages     = {17},
  doi       = {10.1007/S10994-025-06957-0},
  volume    = {115},
  url       = {https://mlanthology.org/mlj/2026/barua2026mlj-mechanistic/}
}