Ye, Jiaquan

2 publications

ICLR 2026 DNT: A Deeply Normalized Transformer That Can Be Trained by Momentum SGD Xianbiao Qi, Marco Chen, Wenjie Xiao, Jiaquan Ye, Yelin He, Chun-Guang Li, Zhouchen Lin
ICLR 2025 Taming Transformer Without Using Learning Rate Warmup Xianbiao Qi, Yelin He, Jiaquan Ye, Chun-Guang Li, Bojia Zi, Xili Dai, Qin Zou, Rong Xiao