Approximating Shapley Explanations in Reinforcement Learning

Abstract

Reinforcement learning has achieved remarkable success in complex decision-making environments, yet its lack of transparency limits its deployment in practice, especially in safety-critical settings. Shapley values from cooperative game theory provide a principled framework for explaining reinforcement learning; however, the computational cost of Shapley explanations is an obstacle for their use. We introduce FastSVERL, a scalable method for explaining reinforcement learning by approximating Shapley values. FastSVERL is designed to handle the unique challenges of reinforcement learning, including temporal dependencies across multi-step trajectories, learning from off-policy data, and adapting to evolving agent behaviours in real time. FastSVERL introduces a practical, scalable approach for principled and rigourous interpretability in reinforcement learning.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Beechey and Şimşek. "Approximating Shapley Explanations in Reinforcement Learning." Advances in Neural Information Processing Systems, 2025.

Markdown

[Beechey and Şimşek. "Approximating Shapley Explanations in Reinforcement Learning." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/beechey2025neurips-approximating/)

BibTeX

@inproceedings{beechey2025neurips-approximating,
  title     = {{Approximating Shapley Explanations in Reinforcement Learning}},
  author    = {Beechey, Daniel and Şimşek, Özgür},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/beechey2025neurips-approximating/}
}