Yuan, Dun

1 publications

ICLR 2026 Escaping Policy Contraction: Contraction-Aware PPO (CaPPO) for Stable Language Model Fine-Tuning Dun Yuan, Di Wu, Xue Liu