Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning

Abstract

Offline reinforcement learning (RL) have received rising interest due to its appealing data efficiency. The present study addresses behavior estimation, a task that aims at estimating the data-generating policy. In particular, this work considers a scenario where data are collected from multiple sources. Neglecting data heterogeneity, existing approaches cannot provide good estimates and impede policy learning. To overcome this drawback, the present study proposes a latent variable model and a model-learning algorithm to infer a set of policies from data, which allows an agent to use as behavior policy the policy that best describes a particular trajectory. To illustrate the benefit of such a fine-grained characterization for multi-source data, this work showcases how the proposed model can be incorporated into an existing offline RL algorithm. Lastly, with extensive empirical evaluation this work confirms the risks of neglecting data heterogeneity and the efficacy of the proposed model.

Cite

Text

Zhang and Kashima. "Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I9.26326

Markdown

[Zhang and Kashima. "Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/zhang2023aaai-behavior/) doi:10.1609/AAAI.V37I9.26326

BibTeX

@inproceedings{zhang2023aaai-behavior,
  title     = {{Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning}},
  author    = {Zhang, Guoxi and Kashima, Hisashi},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {11201-11209},
  doi       = {10.1609/AAAI.V37I9.26326},
  url       = {https://mlanthology.org/aaai/2023/zhang2023aaai-behavior/}
}