Xiao, Wenjie

1 publications

ICLR 2026 DNT: A Deeply Normalized Transformer That Can Be Trained by Momentum SGD Xianbiao Qi, Marco Chen, Wenjie Xiao, Jiaquan Ye, Yelin He, Chun-Guang Li, Zhouchen Lin