Unsupervised Modeling of Partially Observable Environments

Abstract

We present an architecture based on self-organizing maps for learning a sensory layer in a learning system. The architecture, temporal network for transitions (TNT), enjoys the freedoms of unsupervised learning, works on-line, in non-episodic environments, is computationally light, and scales well. TNT generates a predictive model of its internal representation of the world, making planning methods available for both the exploitation and exploration of the environment. Experiments demonstrate that TNT learns nice representations of classical reinforcement learning mazes of varying size (up to 20×20) under conditions of high-noise and stochastic actions.

Cite

Text

Graziano et al. "Unsupervised Modeling of Partially Observable Environments." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2011. doi:10.1007/978-3-642-23780-5_42

Markdown

[Graziano et al. "Unsupervised Modeling of Partially Observable Environments." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2011.](https://mlanthology.org/ecmlpkdd/2011/graziano2011ecmlpkdd-unsupervised/) doi:10.1007/978-3-642-23780-5_42

BibTeX

@inproceedings{graziano2011ecmlpkdd-unsupervised,
  title     = {{Unsupervised Modeling of Partially Observable Environments}},
  author    = {Graziano, Vincent and Koutník, Jan and Schmidhuber, Jürgen},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2011},
  pages     = {503-515},
  doi       = {10.1007/978-3-642-23780-5_42},
  url       = {https://mlanthology.org/ecmlpkdd/2011/graziano2011ecmlpkdd-unsupervised/}
}