Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data
Abstract
The robust $\phi$-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust $\phi$-regularized fitted Q-iteration for learning an $\epsilon$-optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with robust exploratory requirement) on the nominal model. To the best of our knowledge, we provide the first unified analysis for a class of $\phi$-divergences achieving robust optimal policies in high-dimensional systems of arbitrary large state space with general function approximation. Second, we introduce the hybrid robust $\phi$-regularized reinforcement learning framework to learn an optimal robust policy using both historical data and online sampling. Towards this framework, we propose a model-free algorithm called Hybrid robust Total-variation-regularized Q-iteration. To the best of our knowledge, we provide the first improved out-of-data-distribution assumption in large-scale problems of arbitrary large state space with general function approximation under the hybrid robust $\phi$-regularized reinforcement learning framework.
Cite
Text
Panaganti et al. "Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data." International Conference on Machine Learning, 2024.Markdown
[Panaganti et al. "Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/panaganti2024icml-modelfree/)BibTeX
@inproceedings{panaganti2024icml-modelfree,
title = {{Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data}},
author = {Panaganti, Kishan and Wierman, Adam and Mazumdar, Eric},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {39324-39363},
volume = {235},
url = {https://mlanthology.org/icml/2024/panaganti2024icml-modelfree/}
}