Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit Approach

Abstract

Online decision making plays a crucial role in numerous real-world applications. In many scenarios, the decision is made based on performing a sequence of tests on the incoming data points. However, performing all tests can be expensive and is not always possible. In this paper, we provide a novel formulation of the online decision making problem based on combinatorial multi-armed bandits and take the (possibly stochastic) cost of performing tests into account. Based on this formulation, we provide a new framework for cost-efficient online decision making which can utilize posterior sampling or BayesUCB for exploration. We provide a theoretical analysis of Thompson Sampling for cost-efficient online decision making, and present various experimental results that demonstrate the applicability of our framework to real-world problems.

Cite

Text

Rahbar et al. "Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit Approach." Transactions on Machine Learning Research, 2025.

Markdown

[Rahbar et al. "Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit Approach." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/rahbar2025tmlr-costefficient/)

BibTeX

@article{rahbar2025tmlr-costefficient,
  title     = {{Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit Approach}},
  author    = {Rahbar, Arman and Åkerblom, Niklas and Chehreghani, Morteza Haghir},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/rahbar2025tmlr-costefficient/}
}