Online Learning with Ranking Feedback and an Application to Equilibrium Computation
Abstract
Online learning in arbitrary and possibly adversarial environments has been extensively studied in sequential decision-making, with a strong connection to equilibrium computation in game theory. Most existing online learning algorithms are based on \emph{numeric} utility feedback from the environment, which may be unavailable in applications with humans in the loop and/or the presence of privacy concerns. In this paper, we study an online learning model where only a \emph{ranking} of a set of proposed actions is provided to the learning agent at each timestep. We consider both ranking models based on either the \emph{instantaneous} utility at each timestep, or the \emph{time-average} utility until the current timestep, in both \emph{full-information} and \emph{bandit} feedback settings. Focusing on the standard (external-)regret metric, we show that sublinear regret cannot be achieved in general with the instantaneous utility ranking feedback. Moreover, we show that when the ranking model is relatively \emph{deterministic} (\emph{i.e.,} with a small temperature), sublinear regret cannot be achieved with the time-average utility ranking feedback, either. We then propose new algorithms to achieve sublinear regret, under the additional assumption that the utility vectors have a sublinear variation. Notably, we also show that when time-average utility ranking is used, such an additional assumption can be avoided in the full-information setting. As a consequence, we show that if all the players follow our algorithms, an approximate coarse correlated equilibrium of a normal-form game can be found through repeated play. Finally, we also validate the efficiency of our algorithms via numerical experiments.
Cite
Text
Liu et al. "Online Learning with Ranking Feedback and an Application to Equilibrium Computation." ICLR 2025 Workshops: Bi-Align, 2025.Markdown
[Liu et al. "Online Learning with Ranking Feedback and an Application to Equilibrium Computation." ICLR 2025 Workshops: Bi-Align, 2025.](https://mlanthology.org/iclrw/2025/liu2025iclrw-online/)BibTeX
@inproceedings{liu2025iclrw-online,
title = {{Online Learning with Ranking Feedback and an Application to Equilibrium Computation}},
author = {Liu, Mingyang and Chen, Yongshan and Fan, Zhiyuan and Farina, Gabriele and Ozdaglar, Asuman E. and Zhang, Kaiqing},
booktitle = {ICLR 2025 Workshops: Bi-Align},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/liu2025iclrw-online/}
}