Advantage Amplification in Slowly Evolving Latent-State Environments

Abstract

Latent-state environments with long horizons, such as those faced by recommender systems, pose significant challenges for reinforcement learning (RL). In this work, we identify and analyze several key hurdles for RL in such environments, including belief state error and small action advantage. We develop a general principle called advantage amplification that an overcome these hurdles through the use of temporal abstraction. We propose several aggregation methods and prove they induce amplification in certain settings. We also bound the loss in optimality incurred by our methods in environments where latent state evolves slowly and demonstrate their performance empirically in a stylized user-modeling task.

Cite

Text

Mladenov et al. "Advantage Amplification in Slowly Evolving Latent-State Environments." International Joint Conference on Artificial Intelligence, 2019. doi:10.24963/IJCAI.2019/439

Markdown

[Mladenov et al. "Advantage Amplification in Slowly Evolving Latent-State Environments." International Joint Conference on Artificial Intelligence, 2019.](https://mlanthology.org/ijcai/2019/mladenov2019ijcai-advantage/) doi:10.24963/IJCAI.2019/439

BibTeX

@inproceedings{mladenov2019ijcai-advantage,
  title     = {{Advantage Amplification in Slowly Evolving Latent-State Environments}},
  author    = {Mladenov, Martin and Meshi, Ofer and Ooi, Jayden and Schuurmans, Dale and Boutilier, Craig},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2019},
  pages     = {3165-3172},
  doi       = {10.24963/IJCAI.2019/439},
  url       = {https://mlanthology.org/ijcai/2019/mladenov2019ijcai-advantage/}
}