Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback
Abstract
In this paper, we improve the regret bound for online kernel selection under bandit feedback. Previous algorithm enjoys a $O((\Vert f\Vert ^2_{\mathcal {H}_i}+1)K^{\frac{1}{3}}T^{\frac{2}{3}})$ O ( ( ‖ f ‖ H i 2 + 1 ) K 1 3 T 2 3 ) expected bound for Lipschitz loss functions. We prove two types of regret bounds improving the previous bound. For smooth loss functions, we propose an algorithm with a $O(U^{\frac{2}{3}}K^{-\frac{1}{3}}(\sum ^K_{i=1}L_T(f^*_i))^{\frac{2}{3}})$ O ( U 2 3 K - 1 3 ( ∑ i = 1 K L T ( f i ∗ ) ) 2 3 ) expected bound where $L_T(f^*_i)$ L T ( f i ∗ ) is the cumulative losses of optimal hypothesis in $\mathbb {H}_{i} =\{f\in \mathcal {H}_i:\Vert f\Vert _{\mathcal {H}_i}\le U\}$ H i = f ∈ H i : ‖ f ‖ H i ≤ U . The data-dependent bound keeps the previous worst-case bound and is smaller if most of candidate kernels match well with the data. For Lipschitz loss functions, we propose an algorithm with a $O(U\sqrt{KT}\ln ^{\frac{2}{3}}{T})$ O ( U KT ln 2 3 T ) expected bound asymptotically improving the previous bound. We apply the two algorithms to online kernel selection with time constraint and prove new regret bounds matching or improving the previous $O(\sqrt{T\ln {K}} +\Vert f\Vert ^2_{\mathcal {H}_i}\max \{\sqrt{T},\frac{T}{\sqrt{\mathcal {R}}}\})$ O ( T ln K + ‖ f ‖ H i 2 max T , T R ) expected bound where $\mathcal {R}$ R is the time budget. Finally, we empirically verify our algorithms on online regression and classification tasks.
Cite
Text
Li and Liao. "Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022. doi:10.1007/978-3-031-26412-2_21Markdown
[Li and Liao. "Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022.](https://mlanthology.org/ecmlpkdd/2022/li2022ecmlpkdd-improved/) doi:10.1007/978-3-031-26412-2_21BibTeX
@inproceedings{li2022ecmlpkdd-improved,
title = {{Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback}},
author = {Li, Junfan and Liao, Shizhong},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2022},
pages = {333-348},
doi = {10.1007/978-3-031-26412-2_21},
url = {https://mlanthology.org/ecmlpkdd/2022/li2022ecmlpkdd-improved/}
}