ML Anthology
Authors
Search
About
Saremi, Omid
9 publications
NeurIPS
2024
How Far Can Transformers Reason? the Globality Barrier and Inductive Scratchpad
Emmanuel Abbe
,
Samy Bengio
,
Aryo Lotfi
,
Colin Sandon
,
Omid Saremi
NeurIPS
2024
How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Etai Littwin
,
Omid Saremi
,
Madhu Advani
,
Vimal Thilak
,
Preetum Nakkiran
,
Chen Huang
,
Joshua Susskind
ICLR
2024
LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures
Vimal Thilak
,
Chen Huang
,
Omid Saremi
,
Laurent Dinh
,
Hanlin Goh
,
Preetum Nakkiran
,
Joshua M. Susskind
,
Etai Littwin
TMLR
2024
The Slingshot Effect: A Late-Stage Optimization Anomaly in Adaptive Gradient Methods
Vimal Thilak
,
Etai Littwin
,
Shuangfei Zhai
,
Omid Saremi
,
Roni Paiss
,
Joshua M. Susskind
ICLR
2024
Vanishing Gradients in Reinforcement Finetuning of Language Models
Noam Razin
,
Hattie Zhou
,
Omid Saremi
,
Vimal Thilak
,
Arwen Bradley
,
Preetum Nakkiran
,
Joshua M. Susskind
,
Etai Littwin
ICLR
2024
What Algorithms Can Transformers Learn? a Study in Length Generalization
Hattie Zhou
,
Arwen Bradley
,
Etai Littwin
,
Noam Razin
,
Omid Saremi
,
Joshua M. Susskind
,
Samy Bengio
,
Preetum Nakkiran
ICLR
2024
When Can Transformers Reason with Abstract Symbols?
Enric Boix-AdserÃ
,
Omid Saremi
,
Emmanuel Abbe
,
Samy Bengio
,
Etai Littwin
,
Joshua M. Susskind
NeurIPSW
2023
What Algorithms Can Transformers Learn? a Study in Length Generalization
Hattie Zhou
,
Arwen Bradley
,
Etai Littwin
,
Noam Razin
,
Omid Saremi
,
Joshua Susskind
,
Samy Bengio
,
Preetum Nakkiran
NeurIPSW
2022
The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the \emph{Grokking Phenomenon}
Vimal Thilak
,
Etai Littwin
,
Shuangfei Zhai
,
Omid Saremi
,
Roni Paiss
,
Joshua M. Susskind