Kang, Dongyeop
16 publications
ICLR
2025
Joint Reward and Policy Learning with Demonstrations and Human Feedback Improves Alignment
ICML
2025
RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models
NeurIPSW
2024
Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment