Frequency-Based Search-Control in Dyna

Abstract

Model-based reinforcement learning has been empirically demonstrated as a successful strategy to improve sample efficiency. In particular, Dyna is an elegant model-based architecture integrating learning and planning that provides huge flexibility of using a model. One of the most important components in Dyna is called search-control, which refers to the process of generating state or state-action pairs from which we query the model to acquire simulated experiences. Search-control is critical in improving learning efficiency. In this work, we propose a simple and novel search-control strategy by searching high frequency regions of the value function. Our main intuition is built on Shannon sampling theorem from signal processing, which indicates that a high frequency signal requires more samples to reconstruct. We empirically show that a high frequency function is more difficult to approximate. This suggests a search-control strategy: we should use states from high frequency regions of the value function to query the model to acquire more samples. We develop a simple strategy to locally measure the frequency of a function by gradient and hessian norms, and provide theoretical justification for this approach. We then apply our strategy to search-control in Dyna, and conduct experiments to show its property and effectiveness on benchmark domains.

Cite

Text

Pan et al. "Frequency-Based Search-Control in Dyna." International Conference on Learning Representations, 2020.

Markdown

[Pan et al. "Frequency-Based Search-Control in Dyna." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/pan2020iclr-frequencybased/)

BibTeX

@inproceedings{pan2020iclr-frequencybased,
  title     = {{Frequency-Based Search-Control in Dyna}},
  author    = {Pan, Yangchen and Mei, Jincheng and Farahmand, Amir-massoud},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://mlanthology.org/iclr/2020/pan2020iclr-frequencybased/}
}