Varre, Aditya

6 publications

ICML 2025 Learning In-Context $n$-Grams with Transformers: Sub-$n$-Grams Are Near-Stationary Points Aditya Varre, Gizem Yüce, Nicolas Flammarion
NeurIPS 2024 SGD vs GD: Rank Deficiency in Linear Networks Aditya Varre, Margarita Sagitova, Nicolas Flammarion
ICMLW 2024 SGD vs GD: Rank Deficiency in Linear Networks Aditya Varre, Margarita Sagitova, Nicolas Flammarion
NeurIPS 2024 Why Do We Need Weight Decay in Modern Deep Learning? Francesco D'Angelo, Maksym Andriushchenko, Aditya Varre, Nicolas Flammarion
NeurIPSW 2023 Why Do We Need Weight Decay for Overparameterized Deep Networks? Francesco D'Angelo, Aditya Varre, Maksym Andriushchenko, Nicolas Flammarion
COLT 2022 Accelerated SGD for Non-Strongly-Convex Least Squares Aditya Varre, Nicolas Flammarion