Dynamic-Depth Context Tree Weighting

Abstract

Reinforcement learning (RL) in partially observable settings is challenging because the agent’s observations are not Markov. Recently proposed methods can learn variable-order Markov models of the underlying process but have steep memory requirements and are sensitive to aliasing between observation histories due to sensor noise. This paper proposes dynamic-depth context tree weighting (D2-CTW), a model-learning method that addresses these limitations. D2-CTW dynamically expands a suffix tree while ensuring that the size of the model, but not its depth, remains bounded. We show that D2-CTW approximately matches the performance of state-of-the-art alternatives at stochastic time-series prediction while using at least an order of magnitude less memory. We also apply D2-CTW to model-based RL, showing that, on tasks that require memory of past observations, D2-CTW can learn without prior knowledge of a good state representation, or even the length of history upon which such a representation should depend.

Cite

Text

Messias and Whiteson. "Dynamic-Depth Context Tree Weighting." Neural Information Processing Systems, 2017.

Markdown

[Messias and Whiteson. "Dynamic-Depth Context Tree Weighting." Neural Information Processing Systems, 2017.](https://mlanthology.org/neurips/2017/messias2017neurips-dynamicdepth/)

BibTeX

@inproceedings{messias2017neurips-dynamicdepth,
  title     = {{Dynamic-Depth Context Tree Weighting}},
  author    = {Messias, Joao V and Whiteson, Shimon},
  booktitle = {Neural Information Processing Systems},
  year      = {2017},
  pages     = {3328-3337},
  url       = {https://mlanthology.org/neurips/2017/messias2017neurips-dynamicdepth/}
}