Hosseini, Arian

12 publications

ICLR 2025 Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux, Arian Hosseini, Rishabh Agarwal, Aaron Courville
ICLR 2025 Generative Verifiers: Reward Modeling as Next-Token Prediction Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal
ICLR 2025 Smaller, Weaker, yet Better: Training LLM Reasoners via Compute-Optimal Sampling Hritik Bansal, Arian Hosseini, Rishabh Agarwal, Vinh Q. Tran, Mehran Kazemi
NeurIPSW 2024 Faster, More Efficient RLHF Through Off-Policy Asynchronous Learning Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux, Arian Hosseini, Rishabh Agarwal, Aaron Courville
NeurIPSW 2024 Generative Verifiers: Reward Modeling as Next-Token Prediction Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal
NeurIPSW 2024 Generative Verifiers: Reward Modeling as Next-Token Prediction Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal
NeurIPSW 2024 Not All LLM Reasoners Are Created Equal Arian Hosseini, Alessandro Sordoni, Daniel Kenji Toyama, Aaron Courville, Rishabh Agarwal
NeurIPSW 2024 Not All LLM Reasoners Are Created Equal Arian Hosseini, Alessandro Sordoni, Daniel Kenji Toyama, Aaron Courville, Rishabh Agarwal
NeurIPSW 2024 Smaller, Weaker, yet Better: Training LLM Reasoners via Compute-Optimal Sampling Hritik Bansal, Arian Hosseini, Rishabh Agarwal, Vinh Q. Tran, Mehran Kazemi
NeurIPS 2023 Joint Prompt Optimization of Stacked LLMs Using Variational Inference Alessandro Sordoni, Eric Yuan, Marc-Alexandre Côté, Matheus Pereira, Adam Trischler, Ziang Xiao, Arian Hosseini, Friederike Niedtner, Nicolas Le Roux
ICLR 2019 Learning to Understand Goal Specifications by Modelling Reward Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Arian Hosseini, Pushmeet Kohli, Edward Grefenstette
NeurIPS 2019 Ordered Memory Yikang Shen, Shawn Tan, Arian Hosseini, Zhouhan Lin, Alessandro Sordoni, Aaron C. Courville