Goal-Directed Planning via Hindsight Experience Replay
Abstract
We consider the problem of goal-directed planning under a deterministic transition model. Monte Carlo Tree Search has shown remarkable performance in solving deterministic control problems. It has been extended from complex continuous domains through function approximators to bias the search of the planning tree in AlphaZero. Nonetheless, these algorithms still struggle with control problems with sparse rewards, such as goal-directed domains, where a positive reward is awarded only when reaching a goal state. In this work, we recast AlphaZero with Hindsight Experience Replay to tackle complex goal-directed planning tasks. We perform a thorough empirical evaluation in several simulated domains, including a novel application to a quantum compiling domain.
Cite
Text
Moro et al. "Goal-Directed Planning via Hindsight Experience Replay." International Conference on Learning Representations, 2022.Markdown
[Moro et al. "Goal-Directed Planning via Hindsight Experience Replay." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/moro2022iclr-goaldirected/)BibTeX
@inproceedings{moro2022iclr-goaldirected,
title = {{Goal-Directed Planning via Hindsight Experience Replay}},
author = {Moro, Lorenzo and Likmeta, Amarildo and Prati, Enrico and Restelli, Marcello},
booktitle = {International Conference on Learning Representations},
year = {2022},
url = {https://mlanthology.org/iclr/2022/moro2022iclr-goaldirected/}
}