Better State Exploration Using Action Sequence Equivalence

Abstract

Incorporating prior knowledge in reinforcement learning algorithms is mainly an open question. Even when insights about the environment dynamics are available, reinforcement learning is traditionally used in a \emph{tabula rasa} setting and must explore and learn everything from scratch. In this paper, we consider the problem of exploiting priors about action sequence equivalence: that is, when different sequences of actions produce the same effect. We propose a new local exploration strategy calibrated to minimize collisions and maximize new state visitations. We show that this strategy can be computed at little cost, by solving a convex optimization problem. By replacing the usual $\epsilon$-greedy strategy in a DQN, we demonstrate its potential in several environments with various dynamic structures.

Cite

Text

Grinsztajn et al. "Better State Exploration Using Action Sequence Equivalence." NeurIPS 2022 Workshops: DeepRL, 2022.

Markdown

[Grinsztajn et al. "Better State Exploration Using Action Sequence Equivalence." NeurIPS 2022 Workshops: DeepRL, 2022.](https://mlanthology.org/neuripsw/2022/grinsztajn2022neuripsw-better/)

BibTeX

@inproceedings{grinsztajn2022neuripsw-better,
  title     = {{Better State Exploration Using Action Sequence Equivalence}},
  author    = {Grinsztajn, Nathan and Johnstone, Toby and Ferret, Johan and Preux, Philippe},
  booktitle = {NeurIPS 2022 Workshops: DeepRL},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/grinsztajn2022neuripsw-better/}
}