Improving Generalization with Approximate Factored Value Functions
Abstract
Reinforcement learning in general unstructured MDPs presents a challenging learning problem. However, certain MDP structures, such as factorization, are known to simplify the learning problem. This fact is often not useful in complex tasks with high-dimensional state spaces which do not usually exhibit such structure, and even if the structure is present, it is typically unknown. In this work, we instead turn this observation on its head. Instead of developing algorithms for structured MDPs, we propose a representation learning algorithm that approximates an unstructured MDP with one that has factorized structure. We then use these factors as a more convenient representation of the state for downstream learning. The particular structure that we leverage is reward factorization, which defines a more compact class of MDPs that admit factorized value functions. We empirically verify the effectiveness of our approach in terms of faster training (better sample complexity) and robust zero-shot transfer (better generalization) on the ProcGen benchmark and the MiniGrid environments.
Cite
Text
Sodhani et al. "Improving Generalization with Approximate Factored Value Functions." Transactions on Machine Learning Research, 2023.Markdown
[Sodhani et al. "Improving Generalization with Approximate Factored Value Functions." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/sodhani2023tmlr-improving/)BibTeX
@article{sodhani2023tmlr-improving,
title = {{Improving Generalization with Approximate Factored Value Functions}},
author = {Sodhani, Shagun and Levine, Sergey and Zhang, Amy},
journal = {Transactions on Machine Learning Research},
year = {2023},
url = {https://mlanthology.org/tmlr/2023/sodhani2023tmlr-improving/}
}