Actor-Critic Based Improper Reinforcement Learning

Abstract

We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones. This can be useful in tuning across controllers, learnt possibly in mismatched or simulated environments, to obtain a good controller for a given target environment with relatively few trials. Towards this, we propose two algorithms: (1) a Policy Gradient-based approach; and (2) an algorithm that can switch between a simple Actor-Critic (AC) based scheme and a Natural Actor-Critic (NAC) scheme depending on the available information. Both algorithms operate over a class of improper mixtures of the given controllers. For the first case, we derive convergence rate guarantees assuming access to a gradient oracle. For the AC-based approach we provide convergence rate guarantees to a stationary point in the basic AC case and to a global optimum in the NAC case. Numerical results on (i) the standard control theoretic benchmark of stabilizing an inverted pendulum; and (ii) a constrained queueing task show that our improper policy optimization algorithm can stabilize the system even when the base policies at its disposal are unstable.

Cite

Text

Zaki et al. "Actor-Critic Based Improper Reinforcement Learning." International Conference on Machine Learning, 2022.

Markdown

[Zaki et al. "Actor-Critic Based Improper Reinforcement Learning." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/zaki2022icml-actorcritic/)

BibTeX

@inproceedings{zaki2022icml-actorcritic,
  title     = {{Actor-Critic Based Improper Reinforcement Learning}},
  author    = {Zaki, Mohammadi and Mohan, Avi and Gopalan, Aditya and Mannor, Shie},
  booktitle = {International Conference on Machine Learning},
  year      = {2022},
  pages     = {25867-25919},
  volume    = {162},
  url       = {https://mlanthology.org/icml/2022/zaki2022icml-actorcritic/}
}