Factorization Bandits for Interactive Recommendation

Abstract

We perform online interactive recommendation via a factorization-based bandit algorithm. Low-rank matrix completion is performed over an incrementally constructed user-item preference matrix, where an upper confidence bound based item selection strategy is developed to balance the exploit/explore trade-off during online learning. Observable contextual features and dependency among users (e.g., social influence) are leveraged to improve the algorithm's convergence rate and help conquer cold-start in recommendation. A high probability sublinear upper regret bound is proved for the developed algorithm, where considerable regret reduction is achieved on both user and item sides. Extensive experimentations on both simulations and large-scale real-world datasets confirmed the advantages of the proposed algorithm compared with several state-of-the-art factorization-based and bandit-based collaborative filtering methods.

Cite

Text

Wang et al. "Factorization Bandits for Interactive Recommendation." AAAI Conference on Artificial Intelligence, 2017. doi:10.1609/AAAI.V31I1.10936

Markdown

[Wang et al. "Factorization Bandits for Interactive Recommendation." AAAI Conference on Artificial Intelligence, 2017.](https://mlanthology.org/aaai/2017/wang2017aaai-factorization/) doi:10.1609/AAAI.V31I1.10936

BibTeX

@inproceedings{wang2017aaai-factorization,
  title     = {{Factorization Bandits for Interactive Recommendation}},
  author    = {Wang, Huazheng and Wu, Qingyun and Wang, Hongning},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2017},
  pages     = {2695-2702},
  doi       = {10.1609/AAAI.V31I1.10936},
  url       = {https://mlanthology.org/aaai/2017/wang2017aaai-factorization/}
}