Tóth, Sándor

1 publications

NeurIPS 2025 Tapered Off-Policy REINFORCE - Stable and Efficient Reinforcement Learning for Large Language Models Nicolas Le Roux, Marc G Bellemare, Jonathan Lebensold, Arnaud Bergeron, Joshua Greaves, Alexandre Fréchette, Carolyne Pelletier, Eric Thibodeau-Laufer, Sándor Tóth, Sam Work