Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring
Abstract
Partial monitoring is a general model for sequential learning with limited feedback formalized as a game between two players. In this game, the learner chooses an action and at the same time the opponent chooses an outcome, then the learner suffers a loss and receives a feedback signal. The goal of the learner is to minimize the total loss. In this paper, we study partial monitoring with finite actions and stochastic outcomes. We derive a logarithmic distribution-dependent regret lower bound that defines the hardness of the problem. Inspired by the DMED algorithm (Honda and Takemura, 2010) for the multi-armed bandit problem, we propose PM-DMED, an algorithm that minimizes the distribution-dependent regret. PM-DMED significantly outperforms state-of-the-art algorithms in numerical experiments. To show the optimality of PM-DMED with respect to the regret bound, we slightly modify the algorithm by introducing a hinge function (PM-DMED-Hinge). Then, we derive an asymptotical optimal regret upper bound of PM-DMED-Hinge that matches the lower bound.
Cite
Text
Komiyama et al. "Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring." Neural Information Processing Systems, 2015.Markdown
[Komiyama et al. "Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring." Neural Information Processing Systems, 2015.](https://mlanthology.org/neurips/2015/komiyama2015neurips-regret/)BibTeX
@inproceedings{komiyama2015neurips-regret,
title = {{Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring}},
author = {Komiyama, Junpei and Honda, Junya and Nakagawa, Hiroshi},
booktitle = {Neural Information Processing Systems},
year = {2015},
pages = {1792-1800},
url = {https://mlanthology.org/neurips/2015/komiyama2015neurips-regret/}
}