IPO: Interior-Point Policy Optimization Under Constraints

Abstract

In this paper, we study reinforcement learning (RL) algorithms to solve real-world decision problems with the objective of maximizing the long-term reward as well as satisfying cumulative constraints. We propose a novel first-order policy optimization method, Interior-point Policy Optimization (IPO), which augments the objective with logarithmic barrier functions, inspired by the interior-point method. Our proposed method is easy to implement with performance guarantees and can handle general types of cumulative multi-constraint settings. We conduct extensive evaluations to compare our approach with state-of-the-art baselines. Our algorithm outperforms the baseline algorithms, in terms of reward maximization and constraint satisfaction.

Cite

Text

Liu et al. "IPO: Interior-Point Policy Optimization Under Constraints." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I04.5932

Markdown

[Liu et al. "IPO: Interior-Point Policy Optimization Under Constraints." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/liu2020aaai-ipo/) doi:10.1609/AAAI.V34I04.5932

BibTeX

@inproceedings{liu2020aaai-ipo,
  title     = {{IPO: Interior-Point Policy Optimization Under Constraints}},
  author    = {Liu, Yongshuai and Ding, Jiaxin and Liu, Xin},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {4940-4947},
  doi       = {10.1609/AAAI.V34I04.5932},
  url       = {https://mlanthology.org/aaai/2020/liu2020aaai-ipo/}
}