State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding

Abstract

As more non-AI experts use complex AI systems for daily tasks, there has been an increasing effort to develop methods that produce explanations of AI decision making that are understandable by non-AI experts. Towards this effort, leveraging higher-level concepts and producing concept-based explanations have become a popular method. Most concept-based explanations have been developed for classification techniques, and we posit that the few existing methods for sequential decision making are limited in scope. In this work, we first contribute a desiderata for defining ``concepts'' in sequential decision making settings. Additionally, inspired by the Protege Effect which states explaining knowledge often reinforces one's self-learning, we explore how concept-based explanations of an RL agent's decision making can in turn improve the agent's learning rate, as well as improve end-user understanding of the agent's decision making. To this end, we contribute a unified framework, State2Explanation (S2E), that involves learning a joint embedding model between state-action pairs and concept-based explanations, and leveraging such learned model to both (1) inform reward shaping during an agent's training, and (2) provide explanations to end-users at deployment for improved task performance. Our experimental validations, in Connect 4 and Lunar Lander, demonstrate the success of S2E in providing a dual-benefit, successfully informing reward shaping and improving agent learning rate, as well as significantly improving end user task performance at deployment time.

Cite

Text

Das et al. "State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding." Neural Information Processing Systems, 2023.

Markdown

[Das et al. "State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/das2023neurips-state2explanation/)

BibTeX

@inproceedings{das2023neurips-state2explanation,
  title     = {{State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding}},
  author    = {Das, Devleena and Chernova, Sonia and Kim, Been},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/das2023neurips-state2explanation/}
}