Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality

Abstract

Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL). Despite the community's increasing interest, there lacks a formal theoretical formulation for the problem. In this paper, we propose such a formulation for deployment-efficient RL (DE-RL) from an ''optimization with constraints'' perspective: we are interested in exploring an MDP and obtaining a near-optimal policy within minimal \emph{deployment complexity}, whereas in each deployment the policy can sample a large batch of data. Using finite-horizon linear MDPs as a concrete structural model, we reveal the fundamental limit in achieving deployment efficiency by establishing information-theoretic lower bounds, and provide algorithms that achieve the optimal deployment efficiency. Moreover, our formulation for DE-RL is flexible and can serve as a building block for other practically relevant settings; we give ''Safe DE-RL'' and ''Sample-Efficient DE-RL'' as two examples, which may be worth future investigation.

Cite

Text

Huang et al. "Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality." International Conference on Learning Representations, 2022.

Markdown

[Huang et al. "Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/huang2022iclr-deploymentefficient/)

BibTeX

@inproceedings{huang2022iclr-deploymentefficient,
  title     = {{Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality}},
  author    = {Huang, Jiawei and Chen, Jinglin and Zhao, Li and Qin, Tao and Jiang, Nan and Liu, Tie-Yan},
  booktitle = {International Conference on Learning Representations},
  year      = {2022},
  url       = {https://mlanthology.org/iclr/2022/huang2022iclr-deploymentefficient/}
}