Tight Bounds for Online Convex Optimization with Adversarial Constraints
Abstract
A well-studied generalization of the standard online convex optimization (OCO) is constrained online convex optimization (COCO). In COCO, on every round, a convex cost function and a convex constraint function are revealed to the learner after the action for that round is chosen. The objective is to design an online policy that simultaneously achieves a small regret while ensuring small cumulative constraint violation (CCV) against an adaptive adversary. A long-standing open question in COCO is whether an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $O(\sqrt{T})$ CCV without any restrictive assumptions. For the first time, we answer this in the affirmative and show that an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV. We establish this result by effectively combining the adaptive regret bound of the AdaGrad algorithm with Lyapunov optimization - a classic tool from control theory. Surprisingly, the analysis is short and elegant.
Cite
Text
Sinha and Vaze. "Tight Bounds for Online Convex Optimization with Adversarial Constraints." ICML 2024 Workshops: RLControlTheory, 2024.Markdown
[Sinha and Vaze. "Tight Bounds for Online Convex Optimization with Adversarial Constraints." ICML 2024 Workshops: RLControlTheory, 2024.](https://mlanthology.org/icmlw/2024/sinha2024icmlw-tight/)BibTeX
@inproceedings{sinha2024icmlw-tight,
title = {{Tight Bounds for Online Convex Optimization with Adversarial Constraints}},
author = {Sinha, Abhishek and Vaze, Rahul},
booktitle = {ICML 2024 Workshops: RLControlTheory},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/sinha2024icmlw-tight/}
}