Balancing Constraints and Rewards with Meta-Gradient D4PG

Abstract

Deploying Reinforcement Learning (RL) agents to solve real-world applications often requires satisfying complex system constraints. Often the constraint thresholds are incorrectly set due to the complex nature of a system or the inability to verify the thresholds offline (e.g, no simulator or reasonable offline evaluation procedure exists). This results in solutions where a task cannot be solved without violating the constraints. However, in many real-world cases, constraint violations are undesirable yet they are not catastrophic, motivating the need for soft-constrained RL approaches. We present two soft-constrained RL approaches that utilize meta-gradients to find a good trade-off between expected return and minimizing constraint violations. We demonstrate the effectiveness of these approaches by showing that they consistently outperform the baselines across four different Mujoco domains.

Cite

Text

Calian et al. "Balancing Constraints and Rewards with Meta-Gradient D4PG." International Conference on Learning Representations, 2021.

Markdown

[Calian et al. "Balancing Constraints and Rewards with Meta-Gradient D4PG." International Conference on Learning Representations, 2021.](https://mlanthology.org/iclr/2021/calian2021iclr-balancing/)

BibTeX

@inproceedings{calian2021iclr-balancing,
  title     = {{Balancing Constraints and Rewards with Meta-Gradient D4PG}},
  author    = {Calian, Dan A. and Mankowitz, Daniel J and Zahavy, Tom and Xu, Zhongwen and Oh, Junhyuk and Levine, Nir and Mann, Timothy},
  booktitle = {International Conference on Learning Representations},
  year      = {2021},
  url       = {https://mlanthology.org/iclr/2021/calian2021iclr-balancing/}
}