ML Anthology
Authors
Search
About
Yang, Shentao
8 publications
TMLR
2025
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Yueqin Yin
,
Shentao Yang
,
Yujia Xie
,
Ziyi Yang
,
Yuting Sun
,
Hany Hassan Awadalla
,
Weizhu Chen
,
Mingyuan Zhou
ICML
2024
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
Shentao Yang
,
Tianqi Chen
,
Mingyuan Zhou
ICLR
2023
Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems
Yihao Feng
,
Shentao Yang
,
Shujian Zhang
,
Jianguo Zhang
,
Caiming Xiong
,
Mingyuan Zhou
,
Huan Wang
NeurIPS
2023
Preference-Grounded Token-Level Guidance for Language Model Fine-Tuning
Shentao Yang
,
Shujian Zhang
,
Congying Xia
,
Yihao Feng
,
Caiming Xiong
,
Mingyuan Zhou
NeurIPS
2022
A Unified Framework for Alternating Offline Model Training and Policy Learning
Shentao Yang
,
Shujian Zhang
,
Yihao Feng
,
Mingyuan Zhou
NeurIPSW
2022
Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems
Yihao Feng
,
Shentao Yang
,
Shujian Zhang
,
Jianguo Zhang
,
Caiming Xiong
,
Mingyuan Zhou
,
Huan Wang
NeurIPSW
2022
Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems
Yihao Feng
,
Shentao Yang
,
Shujian Zhang
,
Jianguo Zhang
,
Caiming Xiong
,
Mingyuan Zhou
,
Huan Wang
ICML
2022
Regularizing a Model-Based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning
Shentao Yang
,
Yihao Feng
,
Shujian Zhang
,
Mingyuan Zhou