Multitasking: Optimal Planning for Bandit Superprocesses

Abstract

A bandit superprocess is a decision problem composed from multiple independent Markov decision processes (MDPs), coupled only by the constraint that, at each time step, the agent may act in only one of the MDPs. Multitasking problems of this kind are ubiquitous in the real world, yet very little is known about them from a computational viewpoint, beyond the basic observation that optimal policies for the superprocess may prescribe actions that would be suboptimal for an MDP considered in isolation. (This observation implies that many applications of sequential decision analysis in practice are technically incorrect, since the decision problem being solved is typically part of a larger, unstated bandit superprocess.) The paper summarizes the state-of-the-art in the theory of bandit superprocesses and contributes a novel upper bound on the global value function of a bandit superprocess, defined in terms of a direct relaxation of the arms. The bound is equivalent to an existing bound (the Whittle integral) and so provides insight into an otherwise opaque formula. We provide an algorithm to compute this bound and use it to derive the first practical algorithms to select optimal actions in bandit superprocesses. The algorithm operates by repeatedly establishing dominance relations between actions using upper and lower bounds on action values. Experiments indicate that the algorithm's run-time compares very favorably to other possible algorithms designed for more general factored MDPs.

Cite

Text

Hadfield-Menell and Russell. "Multitasking: Optimal Planning for Bandit Superprocesses." Conference on Uncertainty in Artificial Intelligence, 2015.

Markdown

[Hadfield-Menell and Russell. "Multitasking: Optimal Planning for Bandit Superprocesses." Conference on Uncertainty in Artificial Intelligence, 2015.](https://mlanthology.org/uai/2015/hadfieldmenell2015uai-multitasking/)

BibTeX

@inproceedings{hadfieldmenell2015uai-multitasking,
  title     = {{Multitasking: Optimal Planning for Bandit Superprocesses}},
  author    = {Hadfield-Menell, Dylan and Russell, Stuart},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2015},
  pages     = {345-354},
  url       = {https://mlanthology.org/uai/2015/hadfieldmenell2015uai-multitasking/}
}