Hu, Yujing
29 publications
NeurIPS
2025
Improving Reward Models with Proximal Policy Exploration for Preference-Based Reinforcement Learning
NeurIPSW
2024
Optimizing Reward Models with Proximal Policy Exploration in Preference-Based Reinforcement Learning
ICLR
2023
EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-Choice Dynamics Model
NeurIPSW
2022
EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-Choice Dynamics Model
NeurIPSW
2022
Model and Method: Training-Time Attack for Cooperative Multi-Agent Reinforcement Learning