Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-Based Planning in Continuous State Domains

Abstract

Model-based strategies for control are critical to obtain sample efficient learning. Dyna is a planning paradigm that naturally interleaves learning and planning, by simulating one-step experience to update the action-value function. This elegant planning strategy has been mostly explored in the tabular setting. The aim of this paper is to revisit sample-based planning, in stochastic and continuous domains with learned models. We first highlight the flexibility afforded by a model over Experience Replay (ER). Replay-based methods can be seen as stochastic planning methods that repeatedly sample from a buffer of recent agent-environment interactions and perform updates to improve data efficiency. We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly. We introduce a semi-parametric model learning approach, called Reweighted Experience Models (REMs), that makes it simple to sample next states or predecessors. We demonstrate that REM-Dyna exhibits similar advantages over replay-based methods in learning in continuous state problems, and that the performance gap grows when moving to stochastic domains, of increasing size.

Cite

Text

Pan et al. "Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-Based Planning in Continuous State Domains." International Joint Conference on Artificial Intelligence, 2018. doi:10.24963/IJCAI.2018/666

Markdown

[Pan et al. "Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-Based Planning in Continuous State Domains." International Joint Conference on Artificial Intelligence, 2018.](https://mlanthology.org/ijcai/2018/pan2018ijcai-organizing/) doi:10.24963/IJCAI.2018/666

BibTeX

@inproceedings{pan2018ijcai-organizing,
  title     = {{Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-Based Planning in Continuous State Domains}},
  author    = {Pan, Yangchen and Zaheer, Muhammad and White, Adam and Patterson, Andrew and White, Martha},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {4794-4800},
  doi       = {10.24963/IJCAI.2018/666},
  url       = {https://mlanthology.org/ijcai/2018/pan2018ijcai-organizing/}
}