Zhang, Ruoyu

1 publications

NeurIPSW 2023 Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhaojin Wen, Kannan Ramchandran, Jiantao Jiao