Utile Distinction Hidden Markov Models

Abstract

This paper addresses the problem of constructing good action selectionpolicies for agents acting in partially observable environments, a class ofproblems generally known as Partially Observable Markov Decision Processes. Wepresent a novel approach that uses a modification of the well-known Baum-Welchalgorithm for learning a Hidden Markov Model (HMM) to predict both perceptsand utility in a non-deterministic world. This enables an agent to makedecisions based on its previous history of actions, observations, and rewards. Our algorithm, called Utile Distinction Hidden Markov Models (UDHMM), handlesthe creation of memory well in that it tends to create perceptual and utilitydistinctions only when needed, while it can still discriminate states based onhistories of arbitrary length. The experimental results in highly stochasticproblem domains show very good performance.

Cite

Text

Wierstra and Wiering. "Utile Distinction Hidden Markov Models." International Conference on Machine Learning, 2004. doi:10.1145/1015330.1015346

Markdown

[Wierstra and Wiering. "Utile Distinction Hidden Markov Models." International Conference on Machine Learning, 2004.](https://mlanthology.org/icml/2004/wierstra2004icml-utile/) doi:10.1145/1015330.1015346

BibTeX

@inproceedings{wierstra2004icml-utile,
  title     = {{Utile Distinction Hidden Markov Models}},
  author    = {Wierstra, Daan and Wiering, Marco A.},
  booktitle = {International Conference on Machine Learning},
  year      = {2004},
  doi       = {10.1145/1015330.1015346},
  url       = {https://mlanthology.org/icml/2004/wierstra2004icml-utile/}
}