ML Anthology
Authors
Search
About
Morwani, Depen
14 publications
ICLR
2025
A New Perspective on Shampoo's Preconditioner
Depen Morwani
,
Itai Shapira
,
Nikhil Vyas
,
Eran Malach
,
Sham M. Kakade
,
Lucas Janson
ICLR
2025
Deconstructing What Makes a Good Optimizer for Autoregressive Language Models
Rosie Zhao
,
Depen Morwani
,
David Brandfonbrener
,
Nikhil Vyas
,
Sham M. Kakade
ICLR
2025
How Does Critical Batch Size Scale in Pre-Training?
Hanlin Zhang
,
Depen Morwani
,
Nikhil Vyas
,
Jingfeng Wu
,
Difan Zou
,
Udaya Ghai
,
Dean Foster
,
Sham M. Kakade
ICLR
2025
SOAP: Improving and Stabilizing Shampoo Using Adam for Language Modeling
Nikhil Vyas
,
Depen Morwani
,
Rosie Zhao
,
Itai Shapira
,
David Brandfonbrener
,
Lucas Janson
,
Sham M. Kakade
ICMLW
2024
AdaMeM: Memory Efficient Momentum for Adafactor
Nikhil Vyas
,
Depen Morwani
,
Sham M. Kakade
ICML
2024
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Nikhil Vyas
,
Depen Morwani
,
Rosie Zhao
,
Gal Kaplun
,
Sham M. Kakade
,
Boaz Barak
NeurIPSW
2024
Connections Between Schedule-Free SGD, Accelerated SGD Variants, and Weight Averaging
Depen Morwani
,
Nikhil Vyas
,
Hanlin Zhang
,
Sham M. Kakade
NeurIPSW
2024
Deconstructing What Makes a Good Optimizer for Language Models
Rosie Zhao
,
Depen Morwani
,
David Brandfonbrener
,
Nikhil Vyas
,
Sham M. Kakade
ICLR
2024
Feature Emergence via Margin Maximization: Case Studies in Algebraic Tasks
Depen Morwani
,
Benjamin L. Edelman
,
Costin-Andrei Oncescu
,
Rosie Zhao
,
Sham M. Kakade
NeurIPSW
2024
How Does Critical Batch Size Scale in Pre-Training?
Hanlin Zhang
,
Depen Morwani
,
Nikhil Vyas
,
Jingfeng Wu
,
Difan Zou
,
Udaya Ghai
,
Dean Foster
,
Sham M. Kakade
NeurIPSW
2024
SOAP: Improving and Stabilizing Shampoo Using Adam
Nikhil Vyas
,
Depen Morwani
,
Rosie Zhao
,
Itai Shapira
,
David Brandfonbrener
,
Lucas Janson
,
Sham M. Kakade
NeurIPS
2023
Feature-Learning Networks Are Consistent Across Widths at Realistic Scales
Nikhil Vyas
,
Alexander Atanasov
,
Blake Bordelon
,
Depen Morwani
,
Sabarish Sainathan
,
Cengiz Pehlevan
NeurIPS
2023
Simplicity Bias in 1-Hidden Layer Neural Networks
Depen Morwani
,
Jatin Batra
,
Prateek Jain
,
Praneeth Netrapalli
ALT
2022
Inductive Bias of Gradient Descent for Weight Normalized Smooth Homogeneous Neural Nets
Depen Morwani
,
Harish G. Ramaswamy