Towards Interpretable Deep Reinforcement Learning with Human-Friendly Prototypes

Abstract

Despite recent success of deep learning models in research settings, their application in sensitive domains remains limited because of their opaque decision-making processes. Taking to this challenge, people have proposed various eXplainable AI (XAI) techniques designed to calibrate trust and understandability of black-box models, with the vast majority of work focused on supervised learning. Here, we focus on making an "interpretable-by-design" deep reinforcement learning agent which is forced to use human-friendly prototypes in its decisions, thus making its reasoning process clear. Our proposed method, dubbed Prototype-Wrapper Network (PW-Net), wraps around any neural agent backbone, and results indicate that it does not worsen performance relative to black-box models. Most importantly, we found in a user study that PW-Nets supported better trust calibration and task performance relative to standard interpretability approaches and black-boxes.

Cite

Text

Kenny et al. "Towards Interpretable Deep Reinforcement Learning with Human-Friendly Prototypes." International Conference on Learning Representations, 2023.

Markdown

[Kenny et al. "Towards Interpretable Deep Reinforcement Learning with Human-Friendly Prototypes." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/kenny2023iclr-interpretable/)

BibTeX

@inproceedings{kenny2023iclr-interpretable,
  title     = {{Towards Interpretable Deep Reinforcement Learning with Human-Friendly Prototypes}},
  author    = {Kenny, Eoin M. and Tucker, Mycal and Shah, Julie},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/kenny2023iclr-interpretable/}
}