Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks

Abstract

Being able to reach any desired location in the environment can be a valuable asset for an agent. Learning a policy to navigate between all pairs of states individually is often not feasible. An all-goals updating algorithm uses each transition to learn Q-values towards all goals simultaneously and off-policy. However the expensive numerous updates in parallel limited the approach to small tabular cases so far. To tackle this problem we propose to use convolutional network architectures to generate Q-values and updates for a large number of goals at once. We demonstrate the accuracy and generalization qualities of the proposed method on randomly generated mazes and Sokoban puzzles. In the case of on-screen goal coordinates the resulting mapping from frames to distance-maps directly informs the agent about which places are reachable and in how many steps. As an example of application we show that replacing the random actions in ε-greedy exploration by several actions towards feasible goals generates better exploratory trajectories on Montezuma's Revenge and Super Mario All-Stars games.

Cite

Text

Pardo et al. "Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I04.5983

Markdown

[Pardo et al. "Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/pardo2020aaai-scaling/) doi:10.1609/AAAI.V34I04.5983

BibTeX

@inproceedings{pardo2020aaai-scaling,
  title     = {{Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks}},
  author    = {Pardo, Fabio and Levdik, Vitaly and Kormushev, Petar},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {5355-5362},
  doi       = {10.1609/AAAI.V34I04.5983},
  url       = {https://mlanthology.org/aaai/2020/pardo2020aaai-scaling/}
}