Morwani, Depen

14 publications

ICLR 2025 A New Perspective on Shampoo's Preconditioner Depen Morwani, Itai Shapira, Nikhil Vyas, Eran Malach, Sham M. Kakade, Lucas Janson
ICLR 2025 Deconstructing What Makes a Good Optimizer for Autoregressive Language Models Rosie Zhao, Depen Morwani, David Brandfonbrener, Nikhil Vyas, Sham M. Kakade
ICLR 2025 How Does Critical Batch Size Scale in Pre-Training? Hanlin Zhang, Depen Morwani, Nikhil Vyas, Jingfeng Wu, Difan Zou, Udaya Ghai, Dean Foster, Sham M. Kakade
ICLR 2025 SOAP: Improving and Stabilizing Shampoo Using Adam for Language Modeling Nikhil Vyas, Depen Morwani, Rosie Zhao, Itai Shapira, David Brandfonbrener, Lucas Janson, Sham M. Kakade
ICMLW 2024 AdaMeM: Memory Efficient Momentum for Adafactor Nikhil Vyas, Depen Morwani, Sham M. Kakade
ICML 2024 Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning Nikhil Vyas, Depen Morwani, Rosie Zhao, Gal Kaplun, Sham M. Kakade, Boaz Barak
NeurIPSW 2024 Connections Between Schedule-Free SGD, Accelerated SGD Variants, and Weight Averaging Depen Morwani, Nikhil Vyas, Hanlin Zhang, Sham M. Kakade
NeurIPSW 2024 Deconstructing What Makes a Good Optimizer for Language Models Rosie Zhao, Depen Morwani, David Brandfonbrener, Nikhil Vyas, Sham M. Kakade
ICLR 2024 Feature Emergence via Margin Maximization: Case Studies in Algebraic Tasks Depen Morwani, Benjamin L. Edelman, Costin-Andrei Oncescu, Rosie Zhao, Sham M. Kakade
NeurIPSW 2024 How Does Critical Batch Size Scale in Pre-Training? Hanlin Zhang, Depen Morwani, Nikhil Vyas, Jingfeng Wu, Difan Zou, Udaya Ghai, Dean Foster, Sham M. Kakade
NeurIPSW 2024 SOAP: Improving and Stabilizing Shampoo Using Adam Nikhil Vyas, Depen Morwani, Rosie Zhao, Itai Shapira, David Brandfonbrener, Lucas Janson, Sham M. Kakade
NeurIPS 2023 Feature-Learning Networks Are Consistent Across Widths at Realistic Scales Nikhil Vyas, Alexander Atanasov, Blake Bordelon, Depen Morwani, Sabarish Sainathan, Cengiz Pehlevan
NeurIPS 2023 Simplicity Bias in 1-Hidden Layer Neural Networks Depen Morwani, Jatin Batra, Prateek Jain, Praneeth Netrapalli
ALT 2022 Inductive Bias of Gradient Descent for Weight Normalized Smooth Homogeneous Neural Nets Depen Morwani, Harish G. Ramaswamy