Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback

Li, Junfan; Liao, Shizhong

doi:10.1007/978-3-031-26412-2_21

Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback

Junfan Li, Shizhong Liao

ECML-PKDD 2022 pp. 333-348

doi:10.1007/978-3-031-26412-2_21 /ecmlpkdd/2022/li2022ecmlpkdd-improved/

Abstract

In this paper, we improve the regret bound for online kernel selection under bandit feedback. Previous algorithm enjoys a $O((\Vert f\Vert ^2_{\mathcal {H}_i}+1)K^{\frac{1}{3}}T^{\frac{2}{3}})$ O ( ( ‖ f ‖ H i 2 + 1 ) K 1 3 T 2 3 ) expected bound for Lipschitz loss functions. We prove two types of regret bounds improving the previous bound. For smooth loss functions, we propose an algorithm with a $O(U^{\frac{2}{3}}K^{-\frac{1}{3}}(\sum ^K_{i=1}L_T(f^*_i))^{\frac{2}{3}})$ O ( U 2 3 K - 1 3 ( ∑ i = 1 K L T ( f i ∗ ) ) 2 3 ) expected bound where $L_T(f^*_i)$ L T ( f i ∗ ) is the cumulative losses of optimal hypothesis in $\mathbb {H}_{i} =\{f\in \mathcal {H}_i:\Vert f\Vert _{\mathcal {H}_i}\le U\}$ H i = f ∈ H i : ‖ f ‖ H i ≤ U . The data-dependent bound keeps the previous worst-case bound and is smaller if most of candidate kernels match well with the data. For Lipschitz loss functions, we propose an algorithm with a $O(U\sqrt{KT}\ln ^{\frac{2}{3}}{T})$ O ( U KT ln 2 3 T ) expected bound asymptotically improving the previous bound. We apply the two algorithms to online kernel selection with time constraint and prove new regret bounds matching or improving the previous $O(\sqrt{T\ln {K}} +\Vert f\Vert ^2_{\mathcal {H}_i}\max \{\sqrt{T},\frac{T}{\sqrt{\mathcal {R}}}\})$ O ( T ln K + ‖ f ‖ H i 2 max T , T R ) expected bound where $\mathcal {R}$ R is the time budget. Finally, we empirically verify our algorithms on online regression and classification tasks.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Li and Liao. "Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022. doi:10.1007/978-3-031-26412-2_21

Markdown

[Li and Liao. "Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022.](https://mlanthology.org/ecmlpkdd/2022/li2022ecmlpkdd-improved/) doi:10.1007/978-3-031-26412-2_21

BibTeX

@inproceedings{li2022ecmlpkdd-improved,
  title     = {{Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback}},
  author    = {Li, Junfan and Liao, Shizhong},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2022},
  pages     = {333-348},
  doi       = {10.1007/978-3-031-26412-2_21},
  url       = {https://mlanthology.org/ecmlpkdd/2022/li2022ecmlpkdd-improved/}
}