Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games

Park, Dongmin; Kim, Minkyu; Choi, Beongjun; Kim, Junhyuck; Lee, Keon; Lee, Jonghyun; Park, Inkyu; Lee, Byeong-Uk; Hwang, Jaeyoung; Ahn, Jaewoo; Mahabaleshwarkar, Ameya Sunil; Kartal, Bilal; Biswas, Pritam; Suhara, Yoshi; Lee, Kangwook; Cho, Jaewoong

Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games

Dongmin Park, Minkyu Kim, Beongjun Choi, Junhyuck Kim, Keon Lee, Jonghyun Lee, Inkyu Park, Byeong-Uk Lee, Jaeyoung Hwang, Jaewoo Ahn, Ameya Sunil Mahabaleshwarkar, Bilal Kartal, Pritam Biswas, Yoshi Suhara, Kangwook Lee, Jaewoong Cho

ICLR 2026

/iclr/2026/park2026iclr-orak/

Abstract

Large Language Model (LLM) agents are reshaping the game industry, by enabling more intelligent and human-preferable characters. Yet, current game benchmarks fall short of practical needs: they lack evaluations of diverse LLM capabilities across various game genres, studies of agentic modules crucial for complex gameplay, and fine-tuning datasets to adapt pre-trained LLMs into gaming agents. To fill these gaps, we present Orak, a benchmark for training and evaluating LLM agents across 12 popular video games spanning all major genres. Using a plug-and-play interface built on Model Context Protocol (MCP), Orak supports systematic and reproducible studies of agentic modules in varied game scenarios. We further release a fine-tuning dataset of expert LLM gameplay trajectories spanning multiple genres, turning general LLMs into effective game agents. Orak offers a comprehensive evaluation framework, including game leaderboards, LLM battle arenas, and in-depth analyses of input modality, agentic strategies, and fine-tuning effects, establishing a foundation towards versatile gaming agents. Code and datasets are available at https://github.com/krafton-ai/Orak and https://huggingface.co/datasets/KRAFTON/Orak.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Park et al. "Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games." International Conference on Learning Representations, 2026.

Markdown

[Park et al. "Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/park2026iclr-orak/)

BibTeX

@inproceedings{park2026iclr-orak,
  title     = {{Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games}},
  author    = {Park, Dongmin and Kim, Minkyu and Choi, Beongjun and Kim, Junhyuck and Lee, Keon and Lee, Jonghyun and Park, Inkyu and Lee, Byeong-Uk and Hwang, Jaeyoung and Ahn, Jaewoo and Mahabaleshwarkar, Ameya Sunil and Kartal, Bilal and Biswas, Pritam and Suhara, Yoshi and Lee, Kangwook and Cho, Jaewoong},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/park2026iclr-orak/}
}