Learning Policies for Contextual Submodular Prediction
Abstract
Many prediction domains, such as ad placement, recommendation, trajectory prediction, and document summarization, require predicting a set or list of options. Such lists are often evaluated using submodular reward functions that measure both quality and diversity. We propose a simple, efficient, and provably near-optimal approach to optimizing such prediction problems based on no-regret learning. Our method leverages a surprising result from online submodular optimization: a single no-regret online learner can compete with an optimal sequence of predictions. Compared to previous work, which either learn a sequence of classifiers or rely on stronger assumptions such as realizability, we ensure both data-efficiency as well as performance guarantees in the fully agnostic setting. Experiments validate the efficiency and applicability of the approach on a wide range of problems including manipulator trajectory optimization, news recommendation and document summarization.
Cite
Text
Ross et al. "Learning Policies for Contextual Submodular Prediction." International Conference on Machine Learning, 2013.Markdown
[Ross et al. "Learning Policies for Contextual Submodular Prediction." International Conference on Machine Learning, 2013.](https://mlanthology.org/icml/2013/ross2013icml-learning/)BibTeX
@inproceedings{ross2013icml-learning,
title = {{Learning Policies for Contextual Submodular Prediction}},
author = {Ross, Stephane and Zhou, Jiaji and Yue, Yisong and Dey, Debadeepta and Bagnell, Drew},
booktitle = {International Conference on Machine Learning},
year = {2013},
pages = {1364-1372},
volume = {28},
url = {https://mlanthology.org/icml/2013/ross2013icml-learning/}
}