Reinforcement Learning Can Be More Efficient with Multiple Rewards

Abstract

Reward design is one of the most critical and challenging aspects when formulating a task as a reinforcement learning (RL) problem. In practice, it often takes several attempts of reward specification and learning with it in order to find one that leads to sample-efficient learning of the desired behavior. Instead, in this work, we study whether directly incorporating multiple alternate reward formulations of the same task in a single agent can lead to faster learning. We analyze multi-reward extensions of action-elimination algorithms and prove more favorable instance-dependent regret bounds compared to their single-reward counterparts, both in multi-armed bandits and in tabular Markov decision processes. Our bounds scale for each state-action pair with the inverse of the largest gap among all reward functions. This suggests that learning with multiple rewards can indeed be more sample-efficient, as long as the rewards agree on an optimal policy. We further prove that when rewards do not agree, multi-reward action elimination in multi-armed bandits still learns a policy that is good across all reward functions.

Cite

Text

Dann et al. "Reinforcement Learning Can Be More Efficient with Multiple Rewards." International Conference on Machine Learning, 2023.

Markdown

[Dann et al. "Reinforcement Learning Can Be More Efficient with Multiple Rewards." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/dann2023icml-reinforcement/)

BibTeX

@inproceedings{dann2023icml-reinforcement,
  title     = {{Reinforcement Learning Can Be More Efficient with Multiple Rewards}},
  author    = {Dann, Christoph and Mansour, Yishay and Mohri, Mehryar},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {6948-6967},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/dann2023icml-reinforcement/}
}