Li, Chenliang
25 publications
ICLR
2025
Joint Reward and Policy Learning with Demonstrations and Human Feedback Improves Alignment
ICLRW
2025
Reinforcement Learning in Inference Time: A Perspective from Successive Policy Iterations
NeurIPS
2024
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
NeurIPSW
2024
Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment