Minimax Model Learning

Abstract

We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation objective with an emphasis on correcting distribution shift. Compared to previous model-based techniques, our approach allows for greater robustness under model misspecification or distribution shift induced by learning/evaluating policies that are distinct from the data-generating policy. We provide a theoretical analysis and show empirical improvements over existing model-based off-policy evaluation methods. We provide further analysis showing our loss can be used for off-policy optimization (OPO) and demonstrate its integration with more recent improvements in OPO.

Cite

Text

Voloshin et al. "Minimax Model Learning." Artificial Intelligence and Statistics, 2021.

Markdown

[Voloshin et al. "Minimax Model Learning." Artificial Intelligence and Statistics, 2021.](https://mlanthology.org/aistats/2021/voloshin2021aistats-minimax/)

BibTeX

@inproceedings{voloshin2021aistats-minimax,
  title     = {{Minimax Model Learning}},
  author    = {Voloshin, Cameron and Jiang, Nan and Yue, Yisong},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2021},
  pages     = {1612-1620},
  volume    = {130},
  url       = {https://mlanthology.org/aistats/2021/voloshin2021aistats-minimax/}
}