Interpreting Arithmetic Reasoning in Large Language Models Using Game-Theoretic Interactions

Leilei Wen, Liwei Zheng, Hongda Li, Lijun Sun, Zhihua Wei, Wen Shen

NeurIPS 2025

/neurips/2025/wen2025neurips-interpreting/

Abstract

In recent years, large language models (LLMs) have made significant advancements in arithmetic reasoning. However, the internal mechanism of how LLMs solve arithmetic problems remains unclear. In this paper, we propose explaining arithmetic reasoning in LLMs using game-theoretic interactions. Specifically, we disentangle the output score of the LLM into numerous interactions between the input words. We quantify different types of interactions encoded by LLMs during forward propagation to explore the internal mechanism of LLMs for solving arithmetic problems. We find that (1) the internal mechanism of LLMs for solving simple one-operator arithmetic problems is their capability to encode operand-operator interactions and high-order interactions from input samples. Additionally, we find that LLMs with weak one-operator arithmetic capabilities focus more on background interactions. (2) The internal mechanism of LLMs for solving relatively complex two-operator arithmetic problems is their capability to encode operator interactions and operand interactions from input samples. (3) We explain the task-specific nature of the LoRA method from the perspective of interactions.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Wen et al. "Interpreting Arithmetic Reasoning in Large Language Models Using Game-Theoretic Interactions." Advances in Neural Information Processing Systems, 2025.

Markdown

[Wen et al. "Interpreting Arithmetic Reasoning in Large Language Models Using Game-Theoretic Interactions." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/wen2025neurips-interpreting/)

BibTeX

@inproceedings{wen2025neurips-interpreting,
  title     = {{Interpreting Arithmetic Reasoning in Large Language Models Using Game-Theoretic Interactions}},
  author    = {Wen, Leilei and Zheng, Liwei and Li, Hongda and Sun, Lijun and Wei, Zhihua and Shen, Wen},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/wen2025neurips-interpreting/}
}