Xie, Yuejiao

1 publications

TMLR 2026 RLHF in an SFT Way: From Optimal Solution to Reward-Weighted Alignment Yuhao Du, Zhuo Li, Pengyu Cheng, Zhihong Chen, Yuejiao Xie, Xiang Wan, Anningzhe Gao