Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games
Abstract
Large Language Model (LLM) agents are reshaping the game industry, by enabling more intelligent and human-preferable characters. Yet, current game benchmarks fall short of practical needs: they lack evaluations of diverse LLM capabilities across various game genres, studies of agentic modules crucial for complex gameplay, and fine-tuning datasets to adapt pre-trained LLMs into gaming agents. To fill these gaps, we present Orak, a benchmark for training and evaluating LLM agents across 12 popular video games spanning all major genres. Using a plug-and-play interface built on Model Context Protocol (MCP), Orak supports systematic and reproducible studies of agentic modules in varied game scenarios. We further release a fine-tuning dataset of expert LLM gameplay trajectories spanning multiple genres, turning general LLMs into effective game agents. Orak offers a comprehensive evaluation framework, including game leaderboards, LLM battle arenas, and in-depth analyses of input modality, agentic strategies, and fine-tuning effects, establishing a foundation towards versatile gaming agents. Code and datasets are available at https://github.com/krafton-ai/Orak and https://huggingface.co/datasets/KRAFTON/Orak.
Cite
Text
Park et al. "Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games." International Conference on Learning Representations, 2026.Markdown
[Park et al. "Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/park2026iclr-orak/)BibTeX
@inproceedings{park2026iclr-orak,
title = {{Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games}},
author = {Park, Dongmin and Kim, Minkyu and Choi, Beongjun and Kim, Junhyuck and Lee, Keon and Lee, Jonghyun and Park, Inkyu and Lee, Byeong-Uk and Hwang, Jaeyoung and Ahn, Jaewoo and Mahabaleshwarkar, Ameya Sunil and Kartal, Bilal and Biswas, Pritam and Suhara, Yoshi and Lee, Kangwook and Cho, Jaewoong},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/park2026iclr-orak/}
}