Nonparametric Bayesian Policy Priors for Reinforcement Learning

Doshi-velez, Finale; Wingate, David; Roy, Nicholas; Tenenbaum, Joshua B.

Nonparametric Bayesian Policy Priors for Reinforcement Learning

Finale Doshi-velez, David Wingate, Nicholas Roy, Joshua B. Tenenbaum

NeurIPS 2010 pp. 532-540

/neurips/2010/doshivelez2010neurips-nonparametric/

Abstract

We consider reinforcement learning in partially observable domains where the agent can query an expert for demonstrations. Our nonparametric Bayesian approach combines model knowledge, inferred from expert information and independent exploration, with policy knowledge inferred from expert trajectories. We introduce priors that bias the agent towards models with both simple representations and simple policies, resulting in improved policy and model learning.

PDF NeurIPS Semantic Scholar

Cite

Text

Doshi-velez et al. "Nonparametric Bayesian Policy Priors for Reinforcement Learning." Neural Information Processing Systems, 2010.

Markdown

[Doshi-velez et al. "Nonparametric Bayesian Policy Priors for Reinforcement Learning." Neural Information Processing Systems, 2010.](https://mlanthology.org/neurips/2010/doshivelez2010neurips-nonparametric/)

BibTeX

@inproceedings{doshivelez2010neurips-nonparametric,
  title     = {{Nonparametric Bayesian Policy Priors for Reinforcement Learning}},
  author    = {Doshi-velez, Finale and Wingate, David and Roy, Nicholas and Tenenbaum, Joshua B.},
  booktitle = {Neural Information Processing Systems},
  year      = {2010},
  pages     = {532-540},
  url       = {https://mlanthology.org/neurips/2010/doshivelez2010neurips-nonparametric/}
}