Joshi, Rishabh

9 publications

ICLR 2025 Building Math Agents with Multi-Turn Iterative Preference Learning Wei Xiong, Chengshuai Shi, Jiaming Shen, Aviv Rosenberg, Zhen Qin, Daniele Calandriello, Misha Khalman, Rishabh Joshi, Bilal Piot, Mohammad Saleh, Chi Jin, Tong Zhang, Tianqi Liu
ICLR 2025 Learning from Negative Feedback, or Positive Feedback or Both Abbas Abdolmaleki, Bilal Piot, Bobak Shahriari, Jost Tobias Springenberg, Tim Hertweck, Michael Bloesch, Rishabh Joshi, Thomas Lampe, Junhyuk Oh, Nicolas Heess, Jonas Buchli, Martin Riedmiller
ICLR 2025 RRM: Robust Reward Model Training Mitigates Reward Hacking Tianqi Liu, Wei Xiong, Jie Ren, Lichang Chen, Junru Wu, Rishabh Joshi, Yang Gao, Jiaming Shen, Zhen Qin, Tianhe Yu, Daniel Sohn, Anastasia Makarova, Jeremiah Zhe Liu, Yuan Liu, Bilal Piot, Abe Ittycheriah, Aviral Kumar, Mohammad Saleh
ICML 2025 Reward-Guided Prompt Evolving in Reinforcement Learning for LLMs Ziyu Ye, Rishabh Agarwal, Tianqi Liu, Rishabh Joshi, Sarmishta Velury, Quoc V Le, Qijun Tan, Yuan Liu
NeurIPSW 2024 Evolving Alignment via Asymmetric Self-Play Ziyu Ye, Rishabh Agarwal, Tianqi Liu, Rishabh Joshi, Sarmishta Velury, Quoc V Le, Qijun Tan, Yuan Liu
ICML 2024 Human Alignment of Large Language Models Through Online Preference Optimisation Daniele Calandriello, Zhaohan Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot
ICLR 2024 Statistical Rejection Sampling Improves Preference Optimization Tianqi Liu, Yao Zhao, Rishabh Joshi, Misha Khalman, Mohammad Saleh, Peter J Liu, Jialu Liu
ICLR 2023 Calibrating Sequence Likelihood Improves Conditional Language Generation Yao Zhao, Mikhail Khalman, Rishabh Joshi, Shashi Narayan, Mohammad Saleh, Peter J Liu
ICLR 2021 DialoGraph: Incorporating Interpretable Strategy-Graph Networks into Negotiation Dialogues Rishabh Joshi, Vidhisha Balachandran, Shikhar Vashishth, Alan Black, Yulia Tsvetkov