Kakade, Sham M.

120 publications

ICLR 2025 A New Perspective on Shampoo's Preconditioner Depen Morwani, Itai Shapira, Nikhil Vyas, Eran Malach, Sham M. Kakade, Lucas Janson
ICLR 2025 Deconstructing What Makes a Good Optimizer for Autoregressive Language Models Rosie Zhao, Depen Morwani, David Brandfonbrener, Nikhil Vyas, Sham M. Kakade
ICLR 2025 Eliminating Position Bias of Language Models: A Mechanistic Approach Ziqi Wang, Hanlin Zhang, Xiner Li, Kuan-Hao Huang, Chi Han, Shuiwang Ji, Sham M. Kakade, Hao Peng, Heng Ji
NeurIPS 2025 EvoLM: In Search of Lost Training Dynamics for Language Model Reasoning Zhenting Qi, Fan Nie, Alexandre Alahi, James Zou, Himabindu Lakkaraju, Yilun Du, Eric P. Xing, Sham M. Kakade, Hanlin Zhang
ICLR 2025 Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond Costin-Andrei Oncescu, Sanket Purandare, Stratos Idreos, Sham M. Kakade
ICLR 2025 Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems Zhenting Qi, Hanlin Zhang, Eric P. Xing, Sham M. Kakade, Himabindu Lakkaraju
ICLR 2025 How Does Critical Batch Size Scale in Pre-Training? Hanlin Zhang, Depen Morwani, Nikhil Vyas, Jingfeng Wu, Difan Zou, Udaya Ghai, Dean Foster, Sham M. Kakade
TMLR 2025 Loss-to-Loss Prediction: Scaling Laws for All Datasets David Brandfonbrener, Nikhil Anand, Nikhil Vyas, Eran Malach, Sham M. Kakade
ICLR 2025 Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models Yuda Song, Hanlin Zhang, Carson Eisenach, Sham M. Kakade, Dean Foster, Udaya Ghai
ICLR 2025 Mixture of Parrots: Experts Improve Memorization More than Reasoning Samy Jelassi, Clara Mohri, David Brandfonbrener, Alex Gu, Nikhil Vyas, Nikhil Anand, David Alvarez-Melis, Yuanzhi Li, Sham M. Kakade, Eran Malach
ICLR 2025 SOAP: Improving and Stabilizing Shampoo Using Adam for Language Modeling Nikhil Vyas, Depen Morwani, Rosie Zhao, Itai Shapira, David Brandfonbrener, Lucas Janson, Sham M. Kakade
ICML 2025 The Role of Sparsity for Length Generalization in LLMs Noah Golowich, Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach
ICML 2025 Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions Jaeyeon Kim, Kulin Shah, Vasilis Kontonis, Sham M. Kakade, Sitan Chen
ICLRW 2025 Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions Jaeyeon Kim, Kulin Shah, Vasilis Kontonis, Sham M. Kakade, Sitan Chen
ICML 2025 Universal Length Generalization with Turing Programs Kaiying Hou, David Brandfonbrener, Sham M. Kakade, Samy Jelassi, Eran Malach
ICMLW 2024 AdaMeM: Memory Efficient Momentum for Adafactor Nikhil Vyas, Depen Morwani, Sham M. Kakade
ICML 2024 Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning Nikhil Vyas, Depen Morwani, Rosie Zhao, Gal Kaplun, Sham M. Kakade, Boaz Barak
NeurIPSW 2024 Connections Between Schedule-Free SGD, Accelerated SGD Variants, and Weight Averaging Depen Morwani, Nikhil Vyas, Hanlin Zhang, Sham M. Kakade
NeurIPSW 2024 Deconstructing What Makes a Good Optimizer for Language Models Rosie Zhao, Depen Morwani, David Brandfonbrener, Nikhil Vyas, Sham M. Kakade
NeurIPSW 2024 Distributional Scaling Laws for Emergent Capabilities Rosie Zhao, Naomi Saphra, Sham M. Kakade
NeurIPSW 2024 Eliminating Position Bias of Language Models: A Mechanistic Approach Ziqi Wang, Hanlin Zhang, Xiner Li, Kuan-Hao Huang, Chi Han, Shuiwang Ji, Sham M. Kakade, Hao Peng, Heng Ji
ICLR 2024 Feature Emergence via Margin Maximization: Case Studies in Algebraic Tasks Depen Morwani, Benjamin L. Edelman, Costin-Andrei Oncescu, Rosie Zhao, Sham M. Kakade
ICLRW 2024 Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems Zhenting Qi, Hanlin Zhang, Eric P. Xing, Sham M. Kakade, Himabindu Lakkaraju
NeurIPSW 2024 How Does Critical Batch Size Scale in Pre-Training? Hanlin Zhang, Depen Morwani, Nikhil Vyas, Jingfeng Wu, Difan Zou, Udaya Ghai, Dean Foster, Sham M. Kakade
TMLR 2024 Koopman Spectrum Nonlinear Regulators and Efficient Online Learning Motoya Ohnishi, Isao Ishikawa, Kendall Lowrey, Masahiro Ikeda, Sham M. Kakade, Yoshinobu Kawahara
NeurIPS 2024 Matching the Statistical Query Lower Bound for $k$-Sparse Parity Problems with Sign Stochastic Gradient Descent Yiwen Kou, Zixiang Chen, Quanquan Gu, Sham M. Kakade
NeurIPSW 2024 Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models Yuda Song, Hanlin Zhang, Carson Eisenach, Sham M. Kakade, Dean Foster, Udaya Ghai
NeurIPSW 2024 Mixture of Parrots: Mixtures of Experts Improve Memorization More than Reasoning Samy Jelassi, Clara Mohri, David Brandfonbrener, Alex Gu, Nikhil Vyas, Nikhil Anand, David Alvarez-Melis, Yuanzhi Li, Sham M. Kakade, Eran Malach
ICML 2024 Q-Probe: A Lightweight Approach to Reward Maximization for Language Models Kenneth Li, Samy Jelassi, Hugh Zhang, Sham M. Kakade, Martin Wattenberg, David Brandfonbrener
ICML 2024 Repeat After Me: Transformers Are Better than State Space Models at Copying Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach
NeurIPSW 2024 SOAP: Improving and Stabilizing Shampoo Using Adam Nikhil Vyas, Depen Morwani, Rosie Zhao, Itai Shapira, David Brandfonbrener, Lucas Janson, Sham M. Kakade
TMLR 2024 Scaling Laws for Imitation Learning in Single-Agent Games Jens Tuyls, Dhruv Madeka, Kari Torkkola, Dean Foster, Karthik R Narasimhan, Sham M. Kakade
NeurIPS 2024 Scaling Laws in Linear Regression: Compute, Parameters, and Data Licong Lin, Jingfeng Wu, Sham M. Kakade, Peter L. Bartlett, Jason D. Lee
NeurIPSW 2023 A Study on the Calibration of In-Context Learning Hanlin Zhang, YiFan Zhang, Yaodong Yu, Dhruv Madeka, Dean Foster, Eric P. Xing, Himabindu Lakkaraju, Sham M. Kakade
JMLR 2023 Benign Overfitting of Constant-Stepsize SGD for Linear Regression Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Sham M. Kakade
ICML 2023 Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Sham M. Kakade
ICML 2023 Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games Dylan J Foster, Noah Golowich, Sham M. Kakade
NeurIPSW 2023 MatFormer: Nested Transformer for Elastic Inference Fnu Devvrit, Sneha Kudugunta, Aditya Kusupati, Tim Dettmers, Kaifeng Chen, Inderjit S Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham M. Kakade, Ali Farhadi, Prateek Jain
JMLR 2023 Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity Kaiqing Zhang, Sham M. Kakade, Tamer Basar, Lin F. Yang
ICML 2023 On Provable Copyright Protection for Generative Models Nikhil Vyas, Sham M. Kakade, Boaz Barak
ICMLW 2023 Predicting Task Forgetting in Large Language Models Anat Kleiman, Jonathan Frankle, Sham M. Kakade, Mansheej Paul
ICLR 2023 The Role of Coverage in Online Reinforcement Learning Tengyang Xie, Dylan J Foster, Yu Bai, Nan Jiang, Sham M. Kakade
ICLR 2022 Anti-Concentrated Confidence Bonuses for Scalable Exploration Jordan T. Ash, Cyril Zhang, Surbhi Goel, Akshay Krishnamurthy, Sham M. Kakade
ICLR 2022 Multi-Stage Episodic Control for Strategic Exploration in Text Games Jens Tuyls, Shunyu Yao, Sham M. Kakade, Karthik R Narasimhan
ICLR 2021 Few-Shot Learning via Learning the Representation, Provably Simon Shaolei Du, Wei Hu, Sham M. Kakade, Jason D. Lee, Qi Lei
JMLR 2021 On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift Alekh Agarwal, Sham M. Kakade, Jason D. Lee, Gaurav Mahajan
ICLR 2021 Optimal Regularization Can Mitigate Double Descent Preetum Nakkiran, Prayaag Venkat, Sham M. Kakade, Tengyu Ma
ICLR 2021 What Are the Statistical Limits of Offline RL with Linear Function Approximation? Ruosong Wang, Dean Foster, Sham M. Kakade
ICLR 2020 Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang
COLT 2020 Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes Alekh Agarwal, Sham M Kakade, Jason D Lee, Gaurav Mahajan
NeurIPS 2019 Meta-Learning with Implicit Gradients Aravind Rajeswaran, Chelsea Finn, Sham M. Kakade, Sergey Levine
COLT 2019 Open Problem: Do Good Algorithms Necessarily Query Bad Points? Rong Ge, Prateek Jain, Sham M. Kakade, Rahul Kidambi, Dheeraj M. Nagaraj, Praneeth Netrapalli
NeurIPS 2019 The Step Decay Schedule: A near Optimal, Geometrically Decaying Learning Rate Procedure for Least Squares Rong Ge, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli
NeurIPS 2018 A Smoother Way to Train Structured Prediction Models Venkata Krishna Pillutla, Vincent Roulet, Sham M. Kakade, Zaid Harchaoui
COLT 2018 Accelerating Stochastic Gradient Descent for Least Squares Regression Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford
ICLR 2018 On the Insufficiency of Existing Momentum Schemes for Stochastic Optimization Rahul Kidambi, Praneeth Netrapalli, Prateek Jain, Sham M. Kakade
NeurIPS 2018 Provably Correct Automatic Sub-Differentiation for Qualified Programs Sham M. Kakade, Jason Lee
AISTATS 2017 Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot Prateek Jain, Chi Jin, Sham M. Kakade, Praneeth Netrapalli
ICML 2017 How to Escape Saddle Points Efficiently Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael I. Jordan
ICLR 2017 Learning Features of Music from Scratch John Thickstun, Zaïd Harchaoui, Sham M. Kakade
NeurIPS 2017 Learning Overcomplete HMMs Vatsal Sharan, Sham M. Kakade, Percy Liang, Gregory Valiant
NeurIPS 2017 Towards Generalization and Simplicity in Continuous Control Aravind Rajeswaran, Kendall Lowrey, Emanuel V. Todorov, Sham M. Kakade
NeurIPS 2016 Provable Efficient Online Matrix Completion via Non-Convex Stochastic Gradient Descent Chi Jin, Sham M. Kakade, Praneeth Netrapalli
COLT 2016 Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm Prateek Jain, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford
COLT 2015 Competing with the Empirical Risk Minimizer in a Single Pass Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford
NeurIPS 2015 Convergence Rates of Active Learning for Maximum Likelihood Estimation Kamalika Chaudhuri, Sham M. Kakade, Praneeth Netrapalli, Sujay Sanghavi
NeurIPS 2015 Super-Resolution Off the Grid Qingqing Huang, Sham M. Kakade
ALT 2015 Tensor Decompositions for Learning Latent Variable Models (a Survey for ALT) Anima Anandkumar, Rong Ge, Daniel J. Hsu, Sham M. Kakade, Matus Telgarsky
JMLR 2014 A Tensor Approach to Learning Mixed Membership Community Models Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade
JMLR 2014 Tensor Decompositions for Learning Latent Variable Models Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky
JMLR 2013 A Risk Comparison of Ordinary Least Squares vs Ridge Regression Paramveer S. Dhillon, Dean P. Foster, Sham M. Kakade, Lyle H. Ungar
COLT 2013 A Tensor Spectral Approach to Learning Mixed Membership Community Models Animashree Anandkumar, Rong Ge, Daniel J. Hsu, Sham M. Kakade
NeurIPS 2013 When Are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity Anima Anandkumar, Daniel J. Hsu, Majid Janzamin, Sham M. Kakade
COLT 2012 (weak) Calibration Is Computationally Hard Elad Hazan, Sham M. Kakade
COLT 2012 A Method of Moments for Mixture Models and Hidden Markov Models Animashree Anandkumar, Daniel Hsu, Sham M. Kakade
NeurIPS 2012 A Spectral Algorithm for Latent Dirichlet Allocation Anima Anandkumar, Dean P. Foster, Daniel J. Hsu, Sham M. Kakade, Yi-kai Liu
NeurIPS 2012 Identifiability and Unmixing of Latent Parse Trees Daniel J. Hsu, Sham M. Kakade, Percy Liang
NeurIPS 2012 Learning Mixtures of Tree Graphical Models Anima Anandkumar, Daniel J. Hsu, Furong Huang, Sham M. Kakade
COLT 2012 Random Design Analysis of Ridge Regression Daniel Hsu, Sham M. Kakade, Tong Zhang
JMLR 2012 Regularization Techniques for Learning with Matrices Sham M. Kakade, Shai Shalev-Shwartz, Ambuj Tewari
COLT 2012 Towards Minimax Policies for Online Linear Optimization with Bandit Feedback Sébastien Bubeck, Nicoló Cesa-Bianchi, Sham M. Kakade
NeurIPS 2011 Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression Sham M. Kakade, Varun Kanade, Ohad Shamir, Adam Kalai
COLT 2011 Preface Sham M. Kakade, Ulrike Luxburg
NeurIPS 2011 Spectral Methods for Learning Multivariate Latent Tree Structure Animashree Anandkumar, Kamalika Chaudhuri, Daniel J. Hsu, Sham M. Kakade, Le Song, Tong Zhang
NeurIPS 2011 Stochastic Convex Optimization with Bandit Feedback Alekh Agarwal, Dean P. Foster, Daniel J. Hsu, Sham M. Kakade, Alexander Rakhlin
ICML 2010 Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design Niranjan Srinivas, Andreas Krause, Sham M. Kakade, Matthias W. Seeger
MLJ 2010 Guest Editorial: Special Issue on Learning Theory Sham M. Kakade, Ping Li
NeurIPS 2010 Learning from Logged Implicit Exploration Data Alex Strehl, John Langford, Lihong Li, Sham M. Kakade
COLT 2009 A Spectral Algorithm for Learning Hidden Markov Models Daniel J. Hsu, Sham M. Kakade, Tong Zhang
NeurIPS 2009 Multi-Label Prediction via Compressed Sensing Daniel J. Hsu, Sham M. Kakade, John Langford, Tong Zhang
ICML 2009 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chaudhuri, Sham M. Kakade, Karen Livescu, Karthik Sridharan
COLT 2008 An Information Theoretic Framework for Multi-View Learning Karthik Sridharan, Sham M. Kakade
ICML 2008 Efficient Bandit Algorithms for Online Multiclass Prediction Sham M. Kakade, Shai Shalev-Shwartz, Ambuj Tewari
COLT 2008 High-Probability Regret Bounds for Bandit Online Linear Optimization Peter L. Bartlett, Varsha Dani, Thomas P. Hayes, Sham M. Kakade, Alexander Rakhlin, Ambuj Tewari
NeurIPS 2008 Mind the Duality Gap: Logarithmic Regret Algorithms for Online Optimization Shai Shalev-shwartz, Sham M. Kakade
NeurIPS 2008 On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization Sham M. Kakade, Karthik Sridharan, Ambuj Tewari
NeurIPS 2008 On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade, Ambuj Tewari
COLT 2008 Stochastic Linear Optimization Under Bandit Feedback Varsha Dani, Thomas P. Hayes, Sham M. Kakade
ICCV 2007 Leveragingarchivalvideo for Building Face Datasets Deva Ramanan, Simon Baker, Sham M. Kakade
AISTATS 2007 Maximum Entropy Correlated Equilibria Luis E. Ortiz, Robert E. Schapire, Sham M. Kakade
COLT 2007 Multi-View Regression via Canonical Correlation Analysis Sham M. Kakade, Dean P. Foster
NeurIPS 2007 The Price of Bandit Information for Online Optimization Varsha Dani, Sham M. Kakade, Thomas P. Hayes
IJCAI 2007 The Value of Observation for Monitoring Dynamic Systems Eyal Even-Dar, Sham M. Kakade, Yishay Mansour
ICML 2006 Cover Trees for Nearest Neighbor Alina Beygelzimer, Sham M. Kakade, John Langford
UAI 2005 Planning in POMDPs Using Multiplicity Automata Eyal Even-Dar, Sham M. Kakade, Yishay Mansour
IJCAI 2005 Reinforcement Learning in POMDPs Without Resets Eyal Even-Dar, Sham M. Kakade, Yishay Mansour
COLT 2005 Trading in Markovian Price Models Sham M. Kakade, Michael J. Kearns
NeurIPS 2005 Worst-Case Bounds for Gaussian Process Models Sham M. Kakade, Matthias W. Seeger, Dean P. Foster
COLT 2004 Deterministic Calibration and Nash Equilibrium Sham M. Kakade, Dean P. Foster
NeurIPS 2004 Economic Properties of Social Networks Sham M. Kakade, Michael Kearns, Luis E. Ortiz, Robin Pemantle, Siddharth Suri
NeurIPS 2004 Experts in a Markov Decision Process Eyal Even-dar, Sham M. Kakade, Yishay Mansour
COLT 2004 Graphical Economics Sham M. Kakade, Michael J. Kearns, Luis E. Ortiz
NeurIPS 2004 Online Bounds for Bayesian Algorithms Sham M. Kakade, Andrew Y. Ng
ICML 2003 Exploration in Metric State Spaces Sham M. Kakade, Michael J. Kearns, John Langford
NeurIPS 2003 Policy Search by Dynamic Programming J. A. Bagnell, Sham M. Kakade, Jeff G. Schneider, Andrew Y. Ng
ICML 2002 An Alternate Objective Function for Markovian Fields Sham M. Kakade, Yee Whye Teh, Sam T. Roweis
ICML 2002 Approximately Optimal Approximate Reinforcement Learning Sham M. Kakade, John Langford
ICML 2002 Competitive Analysis of the Explore/Exploit Tradeoff John Langford, Martin Zinkevich, Sham M. Kakade
NeurIPS 2001 A Natural Policy Gradient Sham M. Kakade
COLT 2001 Optimizing Average Reward Using Discounted Rewards Sham M. Kakade