ML Anthology
Authors
Search
About
Yan, Dong
14 publications
ICLR
2025
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Yuzi Yan
,
Yibo Miao
,
Jialian Li
,
YipinZhang
,
Jian Xie
,
Zhijie Deng
,
Dong Yan
ICLR
2025
Learning LLM-as-a-Judge for Preference Alignment
Ziyi Ye
,
Xiangsheng Li
,
Qiuchi Li
,
Qingyao Ai
,
Yujia Zhou
,
Wei Shen
,
Dong Yan
,
Yiqun Liu
ICML
2025
STAIR: Improving Safety Alignment with Introspective Reasoning
Yichi Zhang
,
Siyuan Zhang
,
Yao Huang
,
Zeyu Xia
,
Zhengwei Fang
,
Xiao Yang
,
Ranjie Duan
,
Dong Yan
,
Yinpeng Dong
,
Jun Zhu
AAAI
2025
Sequential Preference Optimization: Multi-Dimensional Preference Alignment with Implicit Reward Modeling
Xingzhou Lou
,
Junge Zhang
,
Jian Xie
,
Lifeng Liu
,
Dong Yan
,
Kaiqi Huang
ICML
2024
Exploring the LLM Journey from Cognition to Expression with Linear Representations
Yuzi Yan
,
Jialian Li
,
Yipin Zhang
,
Dong Yan
IJCAI
2023
On the Reuse Bias in Off-Policy Reinforcement Learning
Chengyang Ying
,
Zhongkai Hao
,
Xinning Zhou
,
Hang Su
,
Dong Yan
,
Jun Zhu
AAAI
2022
Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model
Jialian Li
,
Tongzheng Ren
,
Dong Yan
,
Hang Su
,
Jun Zhu
MLOSS
2022
Tianshou: A Highly Modularized Deep Reinforcement Learning Library
Jiayi Weng
,
Huayu Chen
,
Dong Yan
,
Kaichao You
,
Alexis Duburcq
,
Minghao Zhang
,
Yi Su
,
Hang Su
,
Jun Zhu
IJCAI
2022
Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk
Chengyang Ying
,
Xinning Zhou
,
Hang Su
,
Dong Yan
,
Ning Chen
,
Jun Zhu
IJCAI
2021
Combining Tree Search and Action Prediction for State-of-the-Art Performance in DouDiZhu
Yunsheng Zhang
,
Dong Yan
,
Bei Shi
,
Haobo Fu
,
Qiang Fu
,
Hang Su
,
Jun Zhu
,
Ning Chen
AAAI
2021
Learning Task-Distribution Reward Shaping with Meta-Learning
Haosheng Zou
,
Tongzheng Ren
,
Dong Yan
,
Hang Su
,
Jun Zhu
ICMLW
2021
Towards Safe Reinforcement Learning via Constraining Conditional Value at Risk
Chengyang Ying
,
Xinning Zhou
,
Dong Yan
,
Jun Zhu
ICLR
2020
Lazy-CFR: Fast and Near-Optimal Regret Minimization for Extensive Games with Imperfect Information
Yichi Zhou
,
Tongzheng Ren
,
Jialian Li
,
Dong Yan
,
Jun Zhu
IJCAI
2019
Playing FPS Games with Environment-Aware Hierarchical Reinforcement Learning
Shihong Song
,
Jiayi Weng
,
Hang Su
,
Dong Yan
,
Haosheng Zou
,
Jun Zhu