Bi-Level Hierarchical Neural Contextual Bandits for Online Recommendation
Abstract
Contextual bandit algorithms aim to identify the optimal choice among a set of candidate arms, based on their contextual information. Among others, neural contextual bandit algorithms have demonstrated generally superior performance compared to conventional linear and kernel-based methods. Nevertheless, neural methods can be inherently unsuitable for handling a large number of candidate arms due to their high computational cost when performing principled exploration. Motivated by the widespread availability of arm category information (e.g., movie genres, retailer types), we formulate contextual bandits as a bi-level online recommendation problem, and propose a novel neural bandit framework, named $\text{H}_{2}\text{N-Bandit}$, which utilizes a bi-level hierarchical neural architecture to mitigate the substantial computational cost found in conventional neural bandit methods. To demonstrate its theoretical effectiveness, we provide regret analysis under general over-parameterization settings, along with a guarantee for category-level recommendation. To illustrate its effectiveness and efficiency, we conduct extensive experiments on multiple real-world data sets, highlighting that $\text{H}_{2}\text{N-Bandit}$ can significantly reduce the computational cost over existing strong non-linear baselines, while achieving better or comparable performance under online recommendation settings.
Cite
Text
Qi et al. "Bi-Level Hierarchical Neural Contextual Bandits for Online Recommendation." Transactions on Machine Learning Research, 2026.Markdown
[Qi et al. "Bi-Level Hierarchical Neural Contextual Bandits for Online Recommendation." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/qi2026tmlr-bilevel/)BibTeX
@article{qi2026tmlr-bilevel,
title = {{Bi-Level Hierarchical Neural Contextual Bandits for Online Recommendation}},
author = {Qi, Yunzhe and Zhou, Yao and Ban, Yikun and Stewart, Allan and Ruan, Chuanwei and He, Jiachuan and Prasad, Shishir Kumar and Wang, Haixun and He, Jingrui},
journal = {Transactions on Machine Learning Research},
year = {2026},
url = {https://mlanthology.org/tmlr/2026/qi2026tmlr-bilevel/}
}