Tight Bounds for Online Convex Optimization with Adversarial Constraints

Abstract

A well-studied generalization of the standard online convex optimization (OCO) is constrained online convex optimization (COCO). In COCO, on every round, a convex cost function and a convex constraint function are revealed to the learner after the action for that round is chosen. The objective is to design an online policy that simultaneously achieves a small regret while ensuring small cumulative constraint violation (CCV) against an adaptive adversary. A long-standing open question in COCO is whether an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $O(\sqrt{T})$ CCV without any restrictive assumptions. For the first time, we answer this in the affirmative and show that an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV. We establish this result by effectively combining the adaptive regret bound of the AdaGrad algorithm with Lyapunov optimization - a classic tool from control theory. Surprisingly, the analysis is short and elegant.

Cite

Text

Sinha and Vaze. "Tight Bounds for Online Convex Optimization with Adversarial Constraints." ICML 2024 Workshops: RLControlTheory, 2024.

Markdown

[Sinha and Vaze. "Tight Bounds for Online Convex Optimization with Adversarial Constraints." ICML 2024 Workshops: RLControlTheory, 2024.](https://mlanthology.org/icmlw/2024/sinha2024icmlw-tight/)

BibTeX

@inproceedings{sinha2024icmlw-tight,
  title     = {{Tight Bounds for Online Convex Optimization with Adversarial Constraints}},
  author    = {Sinha, Abhishek and Vaze, Rahul},
  booktitle = {ICML 2024 Workshops: RLControlTheory},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/sinha2024icmlw-tight/}
}