Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities
Abstract
In reinforcement learning, policies are typically evaluated according to the expectation of cumulated rewards. Researchers in decision theory have argued that more sophisticated decision criteria can better model the preferences of a decision maker. In particular, Skew-Symmetric Bilinear (SSB) utility functions generalize vonNeumann and Morgenstern's expected utility (EU) theory to encompass rational decision behaviors that EU cannot accommodate. In this paper, we adopt an SSB utility function to compare policies in the reinforcement learning setting. We provide a model-free SSB reinforcement learning algorithm, SSB Q-learning, and prove its convergence towards a policy that is epsilon-optimal according to SSB. The proposed algorithm is an adaptation of fictitious play [Brown, 1951] combined with techniques from stochastic approximation [Borkar, 1997]. We also present some experimental results which evaluate our approach in a variety of settings.
Cite
Text
Gilbert et al. "Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities." Conference on Uncertainty in Artificial Intelligence, 2016.Markdown
[Gilbert et al. "Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities." Conference on Uncertainty in Artificial Intelligence, 2016.](https://mlanthology.org/uai/2016/gilbert2016uai-model/)BibTeX
@inproceedings{gilbert2016uai-model,
title = {{Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities}},
author = {Gilbert, Hugo and Zanuttini, Bruno and Weng, Paul and Viappiani, Paolo and Nicart, Esther},
booktitle = {Conference on Uncertainty in Artificial Intelligence},
year = {2016},
url = {https://mlanthology.org/uai/2016/gilbert2016uai-model/}
}