Multi-Objective Monte-Carlo Tree Search

Abstract

Concerned with multi-objective reinforcement learning (MORL), this paper presents MO-MCTS, an extension of Monte-Carlo Tree Search to multi-objective sequential decision making. The known multi-objective indicator referred to as hyper-volume indicator is used to define an action selection criterion, replacing the UCB criterion in order to deal with multi-dimensional rewards. MO-MCTS is firstly compared with an existing MORL algorithm on the artificial Deep Sea Treasure problem. Then a scalability study of MO-MCTS is made on the NP-hard problem of grid scheduling, showing that the performance of MO-MCTS matches the non RL-based state of the art albeit with a higher computational cost.

Cite

Text

Wang and Sebag. "Multi-Objective Monte-Carlo Tree Search." Proceedings of the Fourth Asian Conference on Machine Learning, 2012.

Markdown

[Wang and Sebag. "Multi-Objective Monte-Carlo Tree Search." Proceedings of the Fourth Asian Conference on Machine Learning, 2012.](https://mlanthology.org/acml/2012/wang2012acml-multiobjective/)

BibTeX

@inproceedings{wang2012acml-multiobjective,
  title     = {{Multi-Objective Monte-Carlo Tree Search}},
  author    = {Wang, Weijia and Sebag, Michèle},
  booktitle = {Proceedings of the Fourth Asian Conference on Machine Learning},
  year      = {2012},
  pages     = {507-522},
  volume    = {25},
  url       = {https://mlanthology.org/acml/2012/wang2012acml-multiobjective/}
}