Garcia, Alfredo
14 publications
ICLR
2025
Joint Reward and Policy Learning with Demonstrations and Human Feedback Improves Alignment
ICLRW
2025
Reinforcement Learning in Inference Time: A Perspective from Successive Policy Iterations
NeurIPS
2024
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment