Towards Domain Adaptive Neural Contextual Bandits

Abstract

Contextual bandit algorithms are essential for solving real-world decision making problems. In practice, collecting a contextual bandit's feedback from different domains may involve different costs. For example, measuring drug reaction from mice (as a source domain) and humans (as a target domain). Unfortunately, adapting a contextual bandit algorithm from a source domain to a target domain with distribution shift still remains a major challenge and largely unexplored. In this paper, we introduce the first general domain adaptation method for contextual bandits. Our approach learns a bandit model for the target domain by collecting feedback from the source domain. Our theoretical analysis shows that our algorithm maintains a sub-linear regret bound even adapting across domains. Empirical results show that our approach outperforms the state-of-the-art contextual bandit algorithms on real-world datasets. Code will soon be available at https://github.com/Wang-ML-Lab/DABand.

Cite

Text

Wang et al. "Towards Domain Adaptive Neural Contextual Bandits." International Conference on Learning Representations, 2025.

Markdown

[Wang et al. "Towards Domain Adaptive Neural Contextual Bandits." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/wang2025iclr-domain/)

BibTeX

@inproceedings{wang2025iclr-domain,
  title     = {{Towards Domain Adaptive Neural Contextual Bandits}},
  author    = {Wang, Ziyan and Huo, Xiaoming and Wang, Hao},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/wang2025iclr-domain/}
}