Online Learning with Constraints

Abstract

We study online learning where the objective of the decision maker is to maximize her average long-term reward given that some average constraints are satisfied along the sample path. We define the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature’s choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint the convex hull turns out to be the highest attainable function. We further provide an explicit strategy that attains this convex hull using a calibrated forecasting rule.

Cite

Text

Mannor and Tsitsiklis. "Online Learning with Constraints." Annual Conference on Computational Learning Theory, 2006. doi:10.1007/11776420_39

Markdown

[Mannor and Tsitsiklis. "Online Learning with Constraints." Annual Conference on Computational Learning Theory, 2006.](https://mlanthology.org/colt/2006/mannor2006colt-online-a/) doi:10.1007/11776420_39

BibTeX

@inproceedings{mannor2006colt-online-a,
  title     = {{Online Learning with Constraints}},
  author    = {Mannor, Shie and Tsitsiklis, John N.},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2006},
  pages     = {529-543},
  doi       = {10.1007/11776420_39},
  url       = {https://mlanthology.org/colt/2006/mannor2006colt-online-a/}
}