Sarwar, Zain

4 publications

TMLR 2025 Continual Pre-Training of MoEs: How Robust Is Your Router? Benjamin Thérien, Charles-Étienne Joseph, Zain Sarwar, Ashwinee Panda, Anirban Das, Shi-Xiong Zhang, Stephen Rawls, Sambit Sahu, Eugene Belilovsky, Irina Rish
NeurIPS 2025 Dense Backpropagation Improves Training for Sparse Mixture-of-Experts Ashwinee Panda, Vatsal Baherwani, Zain Sarwar, Benjamin Thérien, Sambit Sahu, Tom Goldstein, Supriyo Chakraborty
NeurIPSW 2024 Dense Backpropagation Improves Routing for Sparsely-Gated Mixture-of-Experts Ashwinee Panda, Vatsal Baherwani, Zain Sarwar, Benjamin Thérien, Stephen Rawls, Sambit Sahu, Supriyo Chakraborty, Tom Goldstein
NeurIPSW 2024 Dense Backpropagation Improves Routing for Sparsely-Gated Mixture-of-Experts Ashwinee Panda, Vatsal Baherwani, Zain Sarwar, Benjamin Thérien, Stephen Rawls, Sambit Sahu, Supriyo Chakraborty, Tom Goldstein