Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL

Abstract

Reinforcement learning (RL) in episodic, factored Markov decision processes (FMDPs) is studied. We propose an algorithm called FMDP-BF, which leverages the factorization structure of FMDP. The regret of FMDP-BF is shown to be exponentially smaller than that of optimal algorithms designed for non-factored MDPs, and improves on the best previous result for FMDPs~\citep{osband2014near} by a factor of $\sqrt{nH|\mathcal{S}_i|}$, where $|\mathcal{S}_i|$ is the cardinality of the factored state subspace, $H$ is the planning horizon and $n$ is the number of factored transition. To show the optimality of our bounds, we also provide a lower bound for FMDP, which indicates that our algorithm is near-optimal w.r.t. timestep $T$, horizon $H$ and factored state-action subspace cardinality. Finally, as an application, we study a new formulation of constrained RL, known as RL with knapsack constraints (RLwK), and provides the first sample-efficient algorithm based on FMDP-BF.

Cite

Text

Chen et al. "Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL." International Conference on Learning Representations, 2021.

Markdown

[Chen et al. "Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL." International Conference on Learning Representations, 2021.](https://mlanthology.org/iclr/2021/chen2021iclr-efficient/)

BibTeX

@inproceedings{chen2021iclr-efficient,
  title     = {{Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL}},
  author    = {Chen, Xiaoyu and Hu, Jiachen and Li, Lihong and Wang, Liwei},
  booktitle = {International Conference on Learning Representations},
  year      = {2021},
  url       = {https://mlanthology.org/iclr/2021/chen2021iclr-efficient/}
}