Khalman, Misha

2 publications

ICLR 2025 Building Math Agents with Multi-Turn Iterative Preference Learning Wei Xiong, Chengshuai Shi, Jiaming Shen, Aviv Rosenberg, Zhen Qin, Daniele Calandriello, Misha Khalman, Rishabh Joshi, Bilal Piot, Mohammad Saleh, Chi Jin, Tong Zhang, Tianqi Liu
ICLR 2024 Statistical Rejection Sampling Improves Preference Optimization Tianqi Liu, Yao Zhao, Rishabh Joshi, Misha Khalman, Mohammad Saleh, Peter J Liu, Jialu Liu