Online Learning with Expert Advice and Finite-Horizon Constraints

Abstract

In this paper, we study a sequential decision making problem. The objective is to maximize the average reward accumulated over time subject to temporal cost constraints. The novelty of our setup is that the rewards and constraints are controlled by an adverse opponent. To solve our problem in a practical way, we propose an expert algorithm that guarantees both a vanish-ing regret and a sublinear number of violated constraints. The quality of this solution is demonstrated on a real-world power management problem. Our results support the hypothesis that online learning with convex cost constraints can be performed successfully in practice.

Cite

Text

Kveton et al. "Online Learning with Expert Advice and Finite-Horizon Constraints." AAAI Conference on Artificial Intelligence, 2008.

Markdown

[Kveton et al. "Online Learning with Expert Advice and Finite-Horizon Constraints." AAAI Conference on Artificial Intelligence, 2008.](https://mlanthology.org/aaai/2008/kveton2008aaai-online/)

BibTeX

@inproceedings{kveton2008aaai-online,
  title     = {{Online Learning with Expert Advice and Finite-Horizon Constraints}},
  author    = {Kveton, Branislav and Yu, Jia Yuan and Theocharous, Georgios and Mannor, Shie},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2008},
  pages     = {331-336},
  url       = {https://mlanthology.org/aaai/2008/kveton2008aaai-online/}
}