Quantifying Generalisation in Imitation Learning

Abstract

Imitation learning benchmarks often lack sufficient variation between training and evaluation, limiting meaningful generalisation assessment. We introduce Labyrinth, a benchmarking environment designed to test generalisation with precise control over structure, start and goal positions, and task complexity. It enables verifiably distinct training, evaluation, and test settings. Labyrinth provides a discrete, fully observable state space and known optimal actions, supporting interpretability and fine-grained evaluation. Its flexible setup allows targeted testing of generalisation factors and includes variants like partial observability, key-and-door tasks, and ice-floor hazards. By enabling controlled, reproducible experiments, Labyrinth advances the evaluation of generalisation in imitation learning and provides a valuable tool for developing more robust agents.

Cite

Text

Gavenski and Rodrigues. "Quantifying Generalisation in Imitation Learning." Advances in Neural Information Processing Systems, 2025.

Markdown

[Gavenski and Rodrigues. "Quantifying Generalisation in Imitation Learning." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/gavenski2025neurips-quantifying/)

BibTeX

@inproceedings{gavenski2025neurips-quantifying,
  title     = {{Quantifying Generalisation in Imitation Learning}},
  author    = {Gavenski, Nathan and Rodrigues, Odinaldo},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/gavenski2025neurips-quantifying/}
}