ML Anthology
Authors
Search
About
Zhang, Yushun
12 publications
ICLR
2025
Adam-Mini: Use Fewer Learning Rates to Gain More
Yushun Zhang
,
Congliang Chen
,
Ziniu Li
,
Tian Ding
,
Chenwei Wu
,
Diederik P Kingma
,
Yinyu Ye
,
Zhi-Quan Luo
,
Ruoyu Sun
TMLR
2025
Exploring and Improving Initialization for Deep Graph Neural Networks: A Signal Propagation Perspective
Senmiao Wang
,
Yupeng Chen
,
Yushun Zhang
,
Ruoyu Sun
,
Tian Ding
TMLR
2025
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
Yupeng Chen
,
Senmiao Wang
,
Yushun Zhang
,
Zhihang Lin
,
Haozhe Zhang
,
Weijian Sun
,
Tian Ding
,
Ruoyu Sun
ICMLW
2024
Adam-Mini: Use Fewer Learning Rates to Gain More
Yushun Zhang
,
Congliang Chen
,
Ziniu Li
,
Tian Ding
,
Chenwei Wu
,
Yinyu Ye
,
Zhi-Quan Luo
,
Ruoyu Sun
NeurIPSW
2024
GaLore-Mini: Low Rank Gradient Learning with Fewer Learning Rates
Weihao Huang
,
Zhenyu Zhang
,
Yushun Zhang
,
Zhi-Quan Luo
,
Ruoyu Sun
,
Zhangyang Wang
ICML
2024
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li
,
Tian Xu
,
Yushun Zhang
,
Zhihang Lin
,
Yang Yu
,
Ruoyu Sun
,
Zhi-Quan Luo
NeurIPS
2024
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
,
Congliang Chen
,
Tian Ding
,
Ziniu Li
,
Ruoyu Sun
,
Zhi-Quan Luo
ICMLW
2024
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
,
Congliang Chen
,
Tian Ding
,
Ziniu Li
,
Ruoyu Sun
,
Zhi-Quan Luo
ICMLW
2023
Breaking the Curse of Depth in Graph Convolutional Networks via Refined Initialization Strategy
Senmiao Wang
,
Yupeng Chen
,
Yushun Zhang
,
Tian Ding
,
Ruoyu Sun
NeurIPS
2022
Adam Can Converge Without Any Modification on Update Rules
Yushun Zhang
,
Congliang Chen
,
Naichen Shi
,
Ruoyu Sun
,
Zhi-Quan Luo
ICLR
2022
HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning
Ziniu Li
,
Yingru Li
,
Yushun Zhang
,
Tong Zhang
,
Zhi-Quan Luo
NeurIPS
2021
When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work
Jiawei Zhang
,
Yushun Zhang
,
Mingyi Hong
,
Ruoyu Sun
,
Zhi-Quan Luo