Learning to Play Text-Based Adventure Games with Maximum Entropy Reinforcement Learning

Abstract

Text-based games are a popular testbed for language-based reinforcement learning (RL). In previous work, deep Q-learning is commonly used as the learning agent. Q-learning algorithms are challenging to apply to complex real-world domains due to, for example, their instability in training. Therefore, in this paper, we adapt the soft-actor-critic (SAC) algorithm to the text-based environment. To deal with sparse extrinsic rewards from the environment, we combine it with a potential-based reward shaping technique to provide more informative (dense) reward signals to the RL agent. We apply our method to play difficult text-based games. The SAC method achieves higher scores than the Q-learning methods on many games with only half the number of training steps. This shows that it is well-suited for text-based games. Moreover, we show that the reward shaping technique helps the agent to learn the policy faster and achieve higher scores. In particular, we consider a dynamically learned value function as a potential function for shaping the learner's original sparse reward signals.

Cite

Text

Li et al. "Learning to Play Text-Based Adventure Games with Maximum Entropy Reinforcement Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023. doi:10.1007/978-3-031-43421-1_3

Markdown

[Li et al. "Learning to Play Text-Based Adventure Games with Maximum Entropy Reinforcement Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023.](https://mlanthology.org/ecmlpkdd/2023/li2023ecmlpkdd-learning/) doi:10.1007/978-3-031-43421-1_3

BibTeX

@inproceedings{li2023ecmlpkdd-learning,
  title     = {{Learning to Play Text-Based Adventure Games with Maximum Entropy Reinforcement Learning}},
  author    = {Li, Weichen and Devidze, Rati and Fellenz, Sophie},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2023},
  pages     = {39-54},
  doi       = {10.1007/978-3-031-43421-1_3},
  url       = {https://mlanthology.org/ecmlpkdd/2023/li2023ecmlpkdd-learning/}
}