Yang, Shentao

8 publications

TMLR 2025 Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model Yueqin Yin, Shentao Yang, Yujia Xie, Ziyi Yang, Yuting Sun, Hany Hassan Awadalla, Weizhu Chen, Mingyuan Zhou
ICML 2024 A Dense Reward View on Aligning Text-to-Image Diffusion with Preference Shentao Yang, Tianqi Chen, Mingyuan Zhou
ICLR 2023 Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems Yihao Feng, Shentao Yang, Shujian Zhang, Jianguo Zhang, Caiming Xiong, Mingyuan Zhou, Huan Wang
NeurIPS 2023 Preference-Grounded Token-Level Guidance for Language Model Fine-Tuning Shentao Yang, Shujian Zhang, Congying Xia, Yihao Feng, Caiming Xiong, Mingyuan Zhou
NeurIPS 2022 A Unified Framework for Alternating Offline Model Training and Policy Learning Shentao Yang, Shujian Zhang, Yihao Feng, Mingyuan Zhou
NeurIPSW 2022 Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems Yihao Feng, Shentao Yang, Shujian Zhang, Jianguo Zhang, Caiming Xiong, Mingyuan Zhou, Huan Wang
NeurIPSW 2022 Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems Yihao Feng, Shentao Yang, Shujian Zhang, Jianguo Zhang, Caiming Xiong, Mingyuan Zhou, Huan Wang
ICML 2022 Regularizing a Model-Based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning Shentao Yang, Yihao Feng, Shujian Zhang, Mingyuan Zhou