Approximating Shapley Explanations in Reinforcement Learning

Abstract

Reinforcement learning has achieved remarkable success in complex decision-making environments, yet its lack of transparency limits its deployment in practice, especially in safety-critical settings. Shapley values from cooperative game theory provide a principled framework for explaining reinforcement learning; however, the computational cost of Shapley explanations is an obstacle for their use. We introduce FastSVERL, a scalable method for explaining reinforcement learning by approximating Shapley values. FastSVERL is designed to handle the unique challenges of reinforcement learning, including temporal dependencies across multi-step trajectories, learning from off-policy data, and adapting to evolving agent behaviours in real time. FastSVERL introduces a practical, scalable approach for principled and rigourous interpretability in reinforcement learning.

Cite

Text

Beechey and Şimşek. "Approximating Shapley Explanations in Reinforcement Learning." Advances in Neural Information Processing Systems, 2025.

Markdown

[Beechey and Şimşek. "Approximating Shapley Explanations in Reinforcement Learning." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/beechey2025neurips-approximating/)

BibTeX

@inproceedings{beechey2025neurips-approximating,
  title     = {{Approximating Shapley Explanations in Reinforcement Learning}},
  author    = {Beechey, Daniel and Şimşek, Özgür},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/beechey2025neurips-approximating/}
}