Online Learning with Sample Path Constraints

Abstract

We study online learning where a decision maker interacts with Nature with the objective of maximizing her long-term average reward subject to some sample path average constraints. We define the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature's choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint, the convex hull turns out to be the highest attainable function. Using a calibrated forecasting rule, we provide an explicit strategy that attains this convex hull. We also measure the performance of heuristic methods based on non-calibrated forecasters in experiments involving a CPU power management problem.

Cite

Text

Mannor et al. "Online Learning with Sample Path Constraints." Journal of Machine Learning Research, 2009.

Markdown

[Mannor et al. "Online Learning with Sample Path Constraints." Journal of Machine Learning Research, 2009.](https://mlanthology.org/jmlr/2009/mannor2009jmlr-online/)

BibTeX

@article{mannor2009jmlr-online,
  title     = {{Online Learning with Sample Path Constraints}},
  author    = {Mannor, Shie and Tsitsiklis, John N. and Yu, Jia Yuan},
  journal   = {Journal of Machine Learning Research},
  year      = {2009},
  pages     = {569-590},
  volume    = {10},
  url       = {https://mlanthology.org/jmlr/2009/mannor2009jmlr-online/}
}