Towards Better Interpretability in Deep Q-Networks

Abstract

Deep reinforcement learning techniques have demonstrated superior performance in a wide variety of environments. As improvements in training algorithms continue at a brisk pace, theoretical or empirical studies on understanding what these networks seem to learn, are far behind. In this paper we propose an interpretable neural network architecture for Q-learning which provides a global explanation of the model’s behavior using key-value memories, attention and reconstructible embeddings. With a directed exploration strategy, our model can reach training rewards comparable to the state-of-the-art deep Q-learning models. However, results suggest that the features extracted by the neural network are extremely shallow and subsequent testing using out-of-sample examples shows that the agent can easily overfit to trajectories seen during training.

Cite

Text

Annasamy and Sycara. "Towards Better Interpretability in Deep Q-Networks." AAAI Conference on Artificial Intelligence, 2019. doi:10.1609/AAAI.V33I01.33014561

Markdown

[Annasamy and Sycara. "Towards Better Interpretability in Deep Q-Networks." AAAI Conference on Artificial Intelligence, 2019.](https://mlanthology.org/aaai/2019/annasamy2019aaai-better/) doi:10.1609/AAAI.V33I01.33014561

BibTeX

@inproceedings{annasamy2019aaai-better,
  title     = {{Towards Better Interpretability in Deep Q-Networks}},
  author    = {Annasamy, Raghuram Mandyam and Sycara, Katia P.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2019},
  pages     = {4561-4569},
  doi       = {10.1609/AAAI.V33I01.33014561},
  url       = {https://mlanthology.org/aaai/2019/annasamy2019aaai-better/}
}