Joint Online Learning and Decision-Making via Dual Mirror Descent

Abstract

We consider an online revenue maximization problem over a finite time horizon subject to lower and upper bounds on cost. At each period, an agent receives a context vector sampled i.i.d. from an unknown distribution and needs to make a decision adaptively. The revenue and cost functions depend on the context vector as well as some fixed but possibly unknown parameter vector to be learned. We propose a novel offline benchmark and a new algorithm that mixes an online dual mirror descent scheme with a generic parameter learning process. When the parameter vector is known, we demonstrate an $O(\sqrt{T})$ regret result as well an $O(\sqrt{T})$ bound on the possible constraint violations. When the parameter is not known and must be learned, we demonstrate that the regret and constraint violations are the sums of the previous $O(\sqrt{T})$ terms plus terms that directly depend on the convergence of the learning process.

Cite

Text

Lobos et al. "Joint Online Learning and Decision-Making via Dual Mirror Descent." International Conference on Machine Learning, 2021.

Markdown

[Lobos et al. "Joint Online Learning and Decision-Making via Dual Mirror Descent." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/lobos2021icml-joint/)

BibTeX

@inproceedings{lobos2021icml-joint,
  title     = {{Joint Online Learning and Decision-Making via Dual Mirror Descent}},
  author    = {Lobos, Alfonso and Grigas, Paul and Wen, Zheng},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {7080-7089},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/lobos2021icml-joint/}
}