Online Decision Making with History-Average Dependent Costs

Abstract

In many online sequential decision-making scenarios, a learner’s choices affect not just their current costs but also the future ones. In this work, we look at one particular case of such a situation where the costs depend on the time average of past decisions over a history horizon. We first recast this problem with history dependent costs as a problem of decision making under stage-wise constraints. To tackle this, we then propose the novel Follow-The-Adaptively-Regularized-Leader (FTARL) algorithm. Our innovative algorithm incorporates adaptive regularizers that depend explicitly on past decisions, allowing us to enforce stage-wise constraints while simultaneously enabling us to establish tight regret bounds. We also discuss the implications of the length of history horizon on design of no-regret algorithms for our problem and present impossibility results when it is the full learning horizon.

Cite

Text

Hebbar and Langbort. "Online Decision Making with History-Average Dependent Costs." Proceedings of the 6th Annual Learning for Dynamics & Control Conference, 2024.

Markdown

[Hebbar and Langbort. "Online Decision Making with History-Average Dependent Costs." Proceedings of the 6th Annual Learning for Dynamics & Control Conference, 2024.](https://mlanthology.org/l4dc/2024/hebbar2024l4dc-online/)

BibTeX

@inproceedings{hebbar2024l4dc-online,
  title     = {{Online Decision Making with History-Average Dependent Costs}},
  author    = {Hebbar, Vijeth and Langbort, Cedric},
  booktitle = {Proceedings of the 6th Annual Learning for Dynamics & Control Conference},
  year      = {2024},
  pages     = {480-491},
  volume    = {242},
  url       = {https://mlanthology.org/l4dc/2024/hebbar2024l4dc-online/}
}