Littwin, Etai

23 publications

ICML 2025 Distillation Scaling Laws Dan Busbridge, Amitis Shidani, Floris Weers, Jason Ramapuram, Etai Littwin, Russell Webb
NeurIPSW 2024 Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning Etai Littwin, Vimal Thilak, Anand Gopalakrishnan
NeurIPS 2024 How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks Etai Littwin, Omid Saremi, Madhu Advani, Vimal Thilak, Preetum Nakkiran, Chen Huang, Joshua Susskind
ICLR 2024 LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures Vimal Thilak, Chen Huang, Omid Saremi, Laurent Dinh, Hanlin Goh, Preetum Nakkiran, Joshua M. Susskind, Etai Littwin
TMLR 2024 The Slingshot Effect: A Late-Stage Optimization Anomaly in Adaptive Gradient Methods Vimal Thilak, Etai Littwin, Shuangfei Zhai, Omid Saremi, Roni Paiss, Joshua M. Susskind
ICLR 2024 Vanishing Gradients in Reinforcement Finetuning of Language Models Noam Razin, Hattie Zhou, Omid Saremi, Vimal Thilak, Arwen Bradley, Preetum Nakkiran, Joshua M. Susskind, Etai Littwin
ICLR 2024 What Algorithms Can Transformers Learn? a Study in Length Generalization Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Joshua M. Susskind, Samy Bengio, Preetum Nakkiran
ICLR 2024 When Can Transformers Reason with Abstract Symbols? Enric Boix-Adserà, Omid Saremi, Emmanuel Abbe, Samy Bengio, Etai Littwin, Joshua M. Susskind
ICLR 2023 Adaptive Optimization in the $\infty$-Width Limit Etai Littwin, Greg Yang
ICML 2023 Stabilizing Transformer Training by Preventing Attention Entropy Collapse Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Joshua M. Susskind
TMLR 2023 Tight Conditions for When the NTK Approximation Is Valid Enric Boix-Adserà, Etai Littwin
NeurIPS 2023 Transformers Learn Through Gradual Rank Increase Enric Boix-Adsera, Etai Littwin, Emmanuel Abbe, Samy Bengio, Joshua Susskind
NeurIPSW 2023 What Algorithms Can Transformers Learn? a Study in Length Generalization Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Joshua Susskind, Samy Bengio, Preetum Nakkiran
ICLR 2022 Learning Representation from Neural Fisher Kernel with Low-Rank Approximation Ruixiang Zhang, Shuangfei Zhai, Etai Littwin, Joshua M. Susskind
NeurIPSW 2022 The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the \emph{Grokking Phenomenon} Vimal Thilak, Etai Littwin, Shuangfei Zhai, Omid Saremi, Roni Paiss, Joshua M. Susskind
UAI 2021 On Random Kernels of Residual Architectures Etai Littwin, Tomer Galanti, Lior Wolf
ICML 2021 Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics Greg Yang, Etai Littwin
NeurIPS 2020 Collegial Ensembles Etai Littwin, Ben Myara, Sima Sabah, Joshua Susskind, Shuangfei Zhai, Oren Golan
NeurIPS 2020 On Infinite-Width Hypernetworks Etai Littwin, Tomer Galanti, Lior Wolf, Greg Yang
ICMLW 2019 On the Convex Behavior of Deep Neural Networks in Relation to the Layers' Width Etai Littwin, Lior Wolf
NeurIPS 2018 Regularizing by the Variance of the Activations' Sample-Variances Etai Littwin, Lior Wolf
CVPR 2016 The Multiverse Loss for Robust Transfer Learning Etai Littwin, Lior Wolf
CVPR 2015 Spherical Embedding of Inlier Silhouette Dissimilarities Etai Littwin, Hadar Averbuch-Elor, Daniel Cohen-Or