ML Anthology
Authors
Search
About
Work, Sam
1 publications
NeurIPS
2025
Tapered Off-Policy REINFORCE - Stable and Efficient Reinforcement Learning for Large Language Models
Nicolas Le Roux
,
Marc G Bellemare
,
Jonathan Lebensold
,
Arnaud Bergeron
,
Joshua Greaves
,
Alexandre Fréchette
,
Carolyne Pelletier
,
Eric Thibodeau-Laufer
,
Sándor Tóth
,
Sam Work