Agnostics: Learning to Synthesize Code in Any Programming Language with a Universal Reinforcement Learning Environment

Boruch-Gruszecki, Aleksander; Zi, Yangtian; Wu, Zixuan; Oberoi, Tejas; Anderson, Carolyn Jane; Biswas, Joydeep; Guha, Arjun

Agnostics: Learning to Synthesize Code in Any Programming Language with a Universal Reinforcement Learning Environment

Aleksander Boruch-Gruszecki, Yangtian Zi, Zixuan Wu, Tejas Oberoi, Carolyn Jane Anderson, Joydeep Biswas, Arjun Guha

ICLR 2026

/iclr/2026/boruchgruszecki2026iclr-agnostics/

Abstract

Large language models (LLMs) already excel at writing code in high-resource languages such as Python and JavaScript, yet stumble on low-resource languages that remain essential to science and engineering. Besides the obvious shortage of pre-training data, post-training itself is a bottleneck: every new language seems to require new datasets, test harnesses, and reinforcement learning (RL) infrastructure. We introduce Agnostics, a language-agnostic post-training pipeline that eliminates this per-language engineering. The key idea is to judge code solely by its externally observable behavior, so a single verifier can test solutions written in any language. Concretely, we (i) use an LLM to rewrite existing unit-test datasets into an I/O format, (ii) supply a short configuration that tells the verifier how to compile and run a target language, and (iii) apply reinforcement learning with verifiable rewards (RLVR) in a robust code execution environment. Applied to five low-resource languages—Lua, Julia, R, OCaml, and Fortran—Agnostics (1) improves Qwen-3 4B to performance that rivals other 16B–70B open-weight models; (2) scales cleanly to larger and diverse model families (Qwen-3 8B, DeepSeek Coder 6.7B Instruct, SmolLM3, Phi 4 Mini); and (3) for open-weight models with ≤16B parameters, sets new state-of-the-art pass@1 results on MultiPL-E and a new multi-language version of LiveCodeBench that we introduce. We release the language-agnostic training datasets (Ag-MBPP-X, Ag-Codeforces-X, Ag-LiveCodeBench-X), training code, and ready-to-use configurations, making RL post-training in any programming language as simple as editing a short YAML file.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Boruch-Gruszecki et al. "Agnostics: Learning to Synthesize Code in Any Programming Language with a Universal Reinforcement Learning Environment." International Conference on Learning Representations, 2026.

Markdown

[Boruch-Gruszecki et al. "Agnostics: Learning to Synthesize Code in Any Programming Language with a Universal Reinforcement Learning Environment." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/boruchgruszecki2026iclr-agnostics/)

BibTeX

@inproceedings{boruchgruszecki2026iclr-agnostics,
  title     = {{Agnostics: Learning to Synthesize Code in Any Programming Language with a Universal Reinforcement Learning Environment}},
  author    = {Boruch-Gruszecki, Aleksander and Zi, Yangtian and Wu, Zixuan and Oberoi, Tejas and Anderson, Carolyn Jane and Biswas, Joydeep and Guha, Arjun},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/boruchgruszecki2026iclr-agnostics/}
}