Bandit Online Optimization over the Permutahedron
Abstract
The permutahedron is the convex polytope with vertex set consisting of the vectors ( π (1),…, π ( n )) for all permutations (bijections) π over 1,…, n . We study a bandit game in which, at each step t , an adversary chooses a hidden weight weight vector s _ t , a player chooses a vertex π _ t of the permutahedron and suffers an observed instantaneous loss of $\sum_{i=1}^n\pi_t(i) s_t(i)$ . We study the problem in two regimes. In the first regime, s _ t is a point in the polytope dual to the permutahedron. Algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a regret of $O(n\sqrt{T \log n})$ after T steps. Unfortunately, CombBand requires at each step an n -by- n matrix permanent computation, a # P -hard problem. Approximating the permanent is possible in the impractical running time of O ( n ^10), with an additional heavy inverse-polynomial dependence on the sought accuracy. We provide an algorithm of slightly worse regret $O(n^{3/2}\sqrt{T})$ but with more realistic time complexity O ( n ^3) per step. The technical contribution is a bound on the variance of the Plackett-Luce noisy sorting process’s ‘pseudo loss’, obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices of rational functions in exponents of 3 parameters. In the second regime, s _ t is in the hypercube. For this case we present and analyze an algorithm based on Bubeck et al.’s (2012) OSMD approach with a novel projection and decomposition technique for the permutahedron. The algorithm is efficient and achieves a regret of $O(n\sqrt{T})$ , but for a more restricted space of possible loss vectors.
Cite
Text
Ailon et al. "Bandit Online Optimization over the Permutahedron." International Conference on Algorithmic Learning Theory, 2014. doi:10.1007/978-3-319-11662-4_16Markdown
[Ailon et al. "Bandit Online Optimization over the Permutahedron." International Conference on Algorithmic Learning Theory, 2014.](https://mlanthology.org/alt/2014/ailon2014alt-bandit/) doi:10.1007/978-3-319-11662-4_16BibTeX
@inproceedings{ailon2014alt-bandit,
title = {{Bandit Online Optimization over the Permutahedron}},
author = {Ailon, Nir and Hatano, Kohei and Takimoto, Eiji},
booktitle = {International Conference on Algorithmic Learning Theory},
year = {2014},
pages = {215-229},
doi = {10.1007/978-3-319-11662-4_16},
url = {https://mlanthology.org/alt/2014/ailon2014alt-bandit/}
}