Wu, Denny

37 publications

NeurIPS 2025 Emergence and Scaling Laws in SGD Learning of Shallow Neural Networks Yunwei Ren, Eshaan Nichani, Denny Wu, Jason D. Lee
NeurIPS 2025 From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers Ryotaro Kawata, Yujin Song, Alberto Bietti, Naoki Nishikawa, Taiji Suzuki, Samuel Vaiter, Denny Wu
NeurIPS 2025 How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime? Wei Huang, Andi Han, Yujin Song, Yilan Chen, Denny Wu, Difan Zou, Taiji Suzuki
COLT 2025 Learning Compositional Functions with Transformers from Easy-to-Hard Data Zixuan Wang, Eshaan Nichani, Alberto Bietti, Alex Damian, Daniel Hsu, Jason D Lee, Denny Wu
ICLR 2025 Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics Alireza Mousavi-Hosseini, Denny Wu, Murat A Erdogdu
NeurIPS 2025 Learning Quadratic Neural Networks in High Dimensions: SGD Dynamics and Scaling Laws Gerard Ben Arous, Murat A Erdogdu, Nuri Mert Vural, Denny Wu
COLT 2025 Mean-Field Analysis of Polynomial-Width Two-Layer Neural Network Beyond Finite Time Horizon Margalit Glasgow, Denny Wu, Joan Bruna
ICML 2025 Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation Juno Kim, Denny Wu, Jason D. Lee, Taiji Suzuki
ICML 2025 Nonlinear Transformers Can Perform Inference-Time Feature Learning Naoki Nishikawa, Yujin Song, Kazusato Oko, Denny Wu, Taiji Suzuki
NeurIPS 2025 When Do Transformers Outperform Feedforward and Recurrent Networks? a Statistical Perspective Alireza Mousavi-Hosseini, Clayton Sanford, Denny Wu, Murat A Erdogdu
ICLR 2024 Improved Statistical and Computational Complexity of the Mean-Field Langevin Dynamics Under Structured Data Atsushi Nitanda, Kazusato Oko, Taiji Suzuki, Denny Wu
ICMLW 2024 Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics Alireza Mousavi-Hosseini, Denny Wu, Murat A Erdogdu
COLT 2024 Learning Sum of Diverse Features: Computational Hardness and Efficient Gradient-Based Training for Ridge Combinations Kazusato Oko, Yujin Song, Taiji Suzuki, Denny Wu
ICMLW 2024 Neural Network Learns Low-Dimensional Polynomials with SGD near the Information-Theoretic Limit Jason D. Lee, Kazusato Oko, Taiji Suzuki, Denny Wu
NeurIPS 2024 Neural Network Learns Low-Dimensional Polynomials with SGD near the Information-Theoretic Limit Jason D. Lee, Kazusato Oko, Taiji Suzuki, Denny Wu
COLT 2024 Nonlinear Spiked Covariance Matrices and Signal Propagation in Deep Neural Networks Zhichao Wang, Denny Wu, Zhou Fan
NeurIPS 2024 Pretrained Transformer Efficiently Learns Low-Dimensional Target Functions In-Context Kazusato Oko, Yujin Song, Taiji Suzuki, Denny Wu
ICML 2024 SILVER: Single-Loop Variance Reduction and Application to Federated Learning Kazusato Oko, Shunta Akiyama, Denny Wu, Tomoya Murata, Taiji Suzuki
ICMLW 2024 Transformer Efficiently Learns Low-Dimensional Target Functions In-Context Yujin Song, Denny Wu, Kazusato Oko, Taiji Suzuki
AISTATS 2024 Why Is Parameter Averaging Beneficial in SGD? an Objective Smoothing Perspective Atsushi Nitanda, Ryuhei Kikuchi, Shugo Maeda, Denny Wu
NeurIPS 2023 Convergence of Mean-Field Langevin Dynamics: Time-Space Discretization, Stochastic Gradient, and Variance Reduction Taiji Suzuki, Denny Wu, Atsushi Nitanda
NeurIPS 2023 Feature Learning via Mean-Field Langevin Dynamics: Classifying Sparse Parities and Beyond Taiji Suzuki, Denny Wu, Kazusato Oko, Atsushi Nitanda
NeurIPS 2023 Gradient-Based Feature Learning Under Structured Data Alireza Mousavi-Hosseini, Denny Wu, Taiji Suzuki, Murat A Erdogdu
NeurIPSW 2023 How Structured Data Guides Feature Learning: A Case Study of the Parity Problem Atsushi Nitanda, Kazusato Oko, Taiji Suzuki, Denny Wu
NeurIPS 2023 Learning in the Presence of Low-Dimensional Structure: A Spiked Random Matrix Perspective Jimmy Ba, Murat A Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu
ICML 2023 Primal and Dual Analysis of Entropic Fictitious Play for Finite-Sum Problems Atsushi Nitanda, Kazusato Oko, Denny Wu, Nobuhito Takenouchi, Taiji Suzuki
ICLR 2023 Uniform-in-Time Propagation of Chaos for the Mean-Field Gradient Langevin Dynamics Taiji Suzuki, Atsushi Nitanda, Denny Wu
AISTATS 2022 Convex Analysis of the Mean Field Langevin Dynamics Atsushi Nitanda, Denny Wu, Taiji Suzuki
NeurIPS 2022 High-Dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation Jimmy Ba, Murat A Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang
ICLR 2022 Particle Stochastic Dual Coordinate Ascent: Exponential Convergent Algorithm for Mean Field Neural Network Optimization Kazusato Oko, Taiji Suzuki, Atsushi Nitanda, Denny Wu
NeurIPS 2022 Two-Layer Neural Network on Infinite Dimensional Data: Global Optimization Guarantee in the Mean-Field Regime Naoki Nishikawa, Taiji Suzuki, Atsushi Nitanda, Denny Wu
ICLR 2022 Understanding the Variance Collapse of SVGD in High Dimensions Jimmy Ba, Murat A Erdogdu, Marzyeh Ghassemi, Shengyang Sun, Taiji Suzuki, Denny Wu, Tianzong Zhang
NeurIPS 2021 Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis Atsushi Nitanda, Denny Wu, Taiji Suzuki
ICLR 2021 When Does Preconditioning Help or Hurt Generalization? Shun-ichi Amari, Jimmy Ba, Roger Baker Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu
ICLR 2020 Generalization of Two-Layer Neural Networks: An Asymptotic Viewpoint Jimmy Ba, Murat Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang
NeurIPS 2020 On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression Denny Wu, Ji Xu
ICLR 2019 Post Selection Inference with Incomplete Maximum Mean Discrepancy Estimator Makoto Yamada, Denny Wu, Yao-Hung Hubert Tsai, Hirofumi Ohta, Ruslan Salakhutdinov, Ichiro Takeuchi, Kenji Fukumizu