Unsupervised Modeling of Partially Observable Environments
Abstract
We present an architecture based on self-organizing maps for learning a sensory layer in a learning system. The architecture, temporal network for transitions (TNT), enjoys the freedoms of unsupervised learning, works on-line, in non-episodic environments, is computationally light, and scales well. TNT generates a predictive model of its internal representation of the world, making planning methods available for both the exploitation and exploration of the environment. Experiments demonstrate that TNT learns nice representations of classical reinforcement learning mazes of varying size (up to 20×20) under conditions of high-noise and stochastic actions.
Cite
Text
Graziano et al. "Unsupervised Modeling of Partially Observable Environments." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2011. doi:10.1007/978-3-642-23780-5_42Markdown
[Graziano et al. "Unsupervised Modeling of Partially Observable Environments." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2011.](https://mlanthology.org/ecmlpkdd/2011/graziano2011ecmlpkdd-unsupervised/) doi:10.1007/978-3-642-23780-5_42BibTeX
@inproceedings{graziano2011ecmlpkdd-unsupervised,
title = {{Unsupervised Modeling of Partially Observable Environments}},
author = {Graziano, Vincent and Koutník, Jan and Schmidhuber, Jürgen},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2011},
pages = {503-515},
doi = {10.1007/978-3-642-23780-5_42},
url = {https://mlanthology.org/ecmlpkdd/2011/graziano2011ecmlpkdd-unsupervised/}
}