Jaggi, Martin

117 publications

TMLR 2026 Leveraging the True Depth of LLMs Ramón Calvo González, Daniele Paliotta, Matteo Pagliardini, Martin Jaggi, François Fleuret
TMLR 2026 Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs Thierry Bossy, Julien Tuấn Tú Vignoud, Tahseen Rabbani, Juan R. Troncoso Pastoriza, Martin Jaggi
ICLR 2025 Attention with Markov: A Curious Case of Single-Layer Transformers Ashok Vardhan Makkuva, Marco Bondaschi, Adway Girish, Alliot Nagle, Martin Jaggi, Hyeji Kim, Michael Gastpar
ICLR 2025 CoTFormer: A Chain of Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi
ICLR 2025 Effective Interplay Between Sparsity and Quantization: From Theory to Practice Simla Burcu Harma, Ayan Chakraborty, Elizaveta Kostenok, Danila Mishin, Dongho Ha, Babak Falsafi, Martin Jaggi, Ming Liu, Yunho Oh, Suvinay Subramanian, Amir Yazdanbakhsh
NeurIPS 2025 Enhancing Multilingual LLM Pretraining with Model-Based Data Selection Bettina Messmer, Vinko Sabolčec, Martin Jaggi
ICLRW 2025 Enhancing Multilingual LLM Pretraining with Model-Based Data Selection Bettina Messmer, Vinko Sabolčec, Martin Jaggi
NeurIPS 2025 GRAPE: Optimize Data Mixture for Group Robust Multi-Target Adaptive Pretraining Simin Fan, Maria Ios Glarou, Martin Jaggi
AISTATS 2025 Improving Stochastic Cubic Newton with Momentum El Mahdi Chayti, Nikita Doikov, Martin Jaggi
ICLR 2025 Intrinsic User-Centric Interpretability Through Global Mixture of Experts Vinitra Swamy, Syrielle Montariol, Julian Blackwell, Jibril Frej, Martin Jaggi, Tanja Käser
ICLRW 2025 Leveraging the True Depth of LLMs Ramón Calvo González, Daniele Paliotta, Matteo Pagliardini, Martin Jaggi, François Fleuret
ICML 2025 On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists Dongyang Fan, Bettina Messmer, Nikita Doikov, Martin Jaggi
ICLRW 2025 On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists Dongyang Fan, Bettina Messmer, Nikita Doikov, Martin Jaggi
NeurIPS 2025 Towards Fully FP8 GEMM LLM Training at Scale Alejandro Hernández-Cano, Dhia Garbaya, Imanol Schlag, Martin Jaggi
TMLR 2025 Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler Aleksandr Dremov, Alexander Hägele, Atli Kosson, Martin Jaggi
NeurIPS 2025 URLs Help, Topics Guide: Understanding Metadata Utility in LLM Training Dongyang Fan, Vinko Sabolčec, Martin Jaggi
ICMLW 2024 Analyzing & Eliminating Learning Rate Warmup in GPT Pre-Training Atli Kosson, Bettina Messmer, Martin Jaggi
NeurIPS 2024 Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training Atli Kosson, Bettina Messmer, Martin Jaggi
ICMLW 2024 Attention with Markov: A Curious Case of Single-Layer Transformers Ashok Vardhan Makkuva, Marco Bondaschi, Alliot Nagle, Adway Girish, Hyeji Kim, Martin Jaggi, Michael Gastpar
NeurIPS 2024 CoBo: Collaborative Learning via Bilevel Optimization Diba Hashemi, Lie He, Martin Jaggi
ICML 2024 DOGE: Domain Reweighting with Generalization Estimation Simin Fan, Matteo Pagliardini, Martin Jaggi
NeurIPS 2024 DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging Matteo Pagliardini, Amirkeivan Mohtashami, Francois Fleuret, Martin Jaggi
AAAI 2024 Ghost Noise for Regularizing Deep Neural Networks Atli Kosson, Dongyang Fan, Martin Jaggi
ICML 2024 LASER: Linear Compression in Wireless Distributed Optimization Ashok Vardhan Makkuva, Marco Bondaschi, Thijs Vogels, Martin Jaggi, Hyeji Kim, Michael Gastpar
ICLR 2024 Layer-Wise Linear Mode Connectivity Linara Adilova, Maksym Andriushchenko, Michael Kamp, Asja Fischer, Martin Jaggi
ICML 2024 On Convergence of Incremental Gradient for Non-Convex Smooth Functions Anastasia Koloskova, Nikita Doikov, Sebastian U Stich, Martin Jaggi
NeurIPS 2024 QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Pashmina Cameron, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman
ICML 2024 Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks Atli Kosson, Bettina Messmer, Martin Jaggi
NeurIPS 2024 Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Alexander Hägele, Elie Bakouch, Atli Kosson, Loubna Ben Allal, Leandro Von Werra, Martin Jaggi
ICMLW 2024 Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Alexander Hägele, Elie Bakouch, Atli Kosson, Loubna Ben Allal, Leandro Von Werra, Martin Jaggi
ICML 2024 Spectral Preconditioning for Gradient Methods on Graded Non-Convex Functions Nikita Doikov, Sebastian U Stich, Martin Jaggi
ICML 2024 The Privacy Power of Correlated Noise in Decentralized Learning Youssef Allouah, Anastasia Koloskova, Aymane El Firdoussi, Martin Jaggi, Rachid Guerraoui
TMLR 2024 Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods El Mahdi Chayti, Martin Jaggi, Nikita Doikov
ICLR 2023 Agree to Disagree: Diversity Through Disagreement for Better Transferability Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy
JMLR 2023 Beyond Spectral Gap: The Role of the Topology in Decentralized Learning Thijs Vogels, Hadrien Hendrikx, Martin Jaggi
NeurIPSW 2023 CoTFormer: More Tokens with Attention Make up for Less Depth Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi
NeurIPS 2023 Collaborative Learning via Prediction Consensus Dongyang Fan, Celestine Mendler-Dünner, Martin Jaggi
NeurIPSW 2023 DOGE: Domain Reweighting with Generalization Estimation Simin Fan, Matteo Pagliardini, Martin Jaggi
NeurIPS 2023 Fast Attention over Long Sequences with Dynamic Sparse Flash Attention Matteo Pagliardini, Daniele Paliotta, Martin Jaggi, François Fleuret
ICMLW 2023 Fast Causal Attention with Dynamic Sparsity Daniele Paliotta, Matteo Pagliardini, Martin Jaggi, François Fleuret
NeurIPSW 2023 LASER: Linear Compression in Wireless Distributed Optimization Ashok Vardhan Makkuva, Marco Bondaschi, Thijs Vogels, Martin Jaggi, Hyeji Kim, Michael Gastpar
ICMLW 2023 Landmark Attention: Random-Access Infinite Context Length for Transformers Amirkeivan Mohtashami, Martin Jaggi
COLT 2023 Linearization Algorithms for Fully Composite Optimization Maria-Luiza Vladarean, Nikita Doikov, Martin Jaggi, Nicolas Flammarion
NeurIPS 2023 MultiMoDN—Multimodal, Multi-Task, Interpretable Modular Networks Vinitra Swamy, Malika Satayeva, Jibril Frej, Thierry Bossy, Thijs Vogels, Martin Jaggi, Tanja Käser, Mary-Anne Hartley
NeurIPS 2023 Multiplication-Free Transformer Training via Piecewise Affine Operations Atli Kosson, Martin Jaggi
TMLR 2023 Provably Personalized and Robust Federated Learning Mariel Werner, Lie He, Michael Jordan, Martin Jaggi, Sai Praneeth Karimireddy
NeurIPS 2023 Random-Access Infinite Context Length for Transformers Amirkeivan Mohtashami, Martin Jaggi
NeurIPSW 2023 Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks Atli Kosson, Bettina Messmer, Martin Jaggi
ICML 2023 Second-Order Optimization with Lazy Hessians Nikita Doikov, El Mahdi Chayti, Martin Jaggi
ICML 2023 Special Properties of Gradient Descent with Large Learning Rates Amirkeivan Mohtashami, Martin Jaggi, Sebastian U Stich
NeurIPSW 2023 Understanding the Role of Noisy Statistics in the Regularization Effect of Batch Normalization Atli Kosson, Dongyang Fan, Martin Jaggi
AISTATS 2022 Masked Training of Neural Networks with Partial Gradients Amirkeivan Mohtashami, Martin Jaggi, Sebastian Stich
NeurIPS 2022 Beyond Spectral Gap: The Role of the Topology in Decentralized Learning Thijs Vogels, Hadrien Hendrikx, Martin Jaggi
ICLR 2022 Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing Sai Praneeth Karimireddy, Lie He, Martin Jaggi
NeurIPSW 2022 Data-Heterogeneity-Aware Mixing for Decentralized Learning Yatin Dandi, Anastasia Koloskova, Martin Jaggi, Sebastian U Stich
NeurIPSW 2022 Decentralized Stochastic Optimization with Client Sampling Ziwei Liu, Anastasia Koloskova, Martin Jaggi, Tao Lin
NeurIPSW 2022 Diversity Through Disagreement for Better Transferability Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy
NeurIPS 2022 FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings Jean Ogier du Terrail, Samy-Safwan Ayed, Edwige Cyffers, Felix Grimberg, Chaoyang He, Regis Loeb, Paul Mangold, Tanguy Marchand, Othmane Marfoq, Erum Mushtaq, Boris Muzellec, Constantin Philippenko, Santiago Silva, Maria Teleńczuk, Shadi Albarqouni, Salman Avestimehr, Aurélien Bellet, Aymeric Dieuleveut, Martin Jaggi, Sai Praneeth Karimireddy, Marco Lorenzi, Giovanni Neglia, Marc Tommasi, Mathieu Andreux
AAAI 2022 Implicit Gradient Alignment in Distributed and Federated Learning Yatin Dandi, Luis Barba, Martin Jaggi
NeurIPS 2022 Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning Anastasiia Koloskova, Sebastian U Stich, Martin Jaggi
NeurIPSW 2022 Towards Provably Personalized Federated Learning via Threshold-Clustering of Similar Clients Mariel Werner, Lie He, Sai Praneeth Karimireddy, Michael Jordan, Martin Jaggi
AISTATS 2021 A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free! Dmitry Kovalev, Anastasia Koloskova, Martin Jaggi, Peter Richtarik, Sebastian Stich
AISTATS 2021 Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates Sebastian Stich, Amirkeivan Mohtashami, Martin Jaggi
AISTATS 2021 LENA: Communication-Efficient Distributed Learning with Self-Triggered Gradient Uploads Hossein Shokri Ghadikolaei, Sebastian Stich, Martin Jaggi
FnTML 2021 Advances and Open Problems in Federated Learning Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista A. Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaïd Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konecný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Hang Qi, Daniel Ramage, Ramesh Raskar, Mariana Raykova, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, Sen Zhao
NeurIPS 2021 Breaking the Centralized Barrier for Cross-Device Federated Learning Sai Praneeth Karimireddy, Martin Jaggi, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian U Stich, Ananda Theertha Suresh
ICML 2021 Consensus Control for Decentralized Deep Learning Lingjing Kong, Tao Lin, Anastasia Koloskova, Martin Jaggi, Sebastian Stich
ICML 2021 Exact Optimization of Conformal Predictors via Incremental and Decremental Learning Giovanni Cherubin, Konstantinos Chatzikokolakis, Martin Jaggi
NeurIPSW 2021 Interpreting Language Models Through Knowledge Graph Extraction Vinitra Swamy, Angelika Romanou, Martin Jaggi
ICML 2021 Learning from History for Byzantine Robust Optimization Sai Praneeth Karimireddy, Lie He, Martin Jaggi
ICML 2021 Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data Tao Lin, Sai Praneeth Karimireddy, Sebastian Stich, Martin Jaggi
NeurIPS 2021 RelaySum for Decentralized Deep Learning on Heterogeneous Data Thijs Vogels, Lie He, Anastasiia Koloskova, Sai Praneeth Karimireddy, Tao Lin, Sebastian U Stich, Martin Jaggi
ICCV 2021 Semantic Perturbations with Normalizing Flows for Improved Generalization Oguz Kaan Yüksel, Sebastian U. Stich, Martin Jaggi, Tatjana Chavdarova
ICMLW 2021 Semantic Perturbations with Normalizing Flows for Improved Generalization Oğuz Kaan Yüksel, Sebastian U Stich, Martin Jaggi, Tatjana Chavdarova
ICLR 2021 Taming GANs with Lookahead-Minmax Tatjana Chavdarova, Matteo Pagliardini, Sebastian U Stich, François Fleuret, Martin Jaggi
ICLR 2021 Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training Namhoon Lee, Thalaiyasingam Ajanthan, Philip Torr, Martin Jaggi
ICML 2020 A Unified Theory of Decentralized SGD with Changing Topology and Local Updates Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, Sebastian Stich
AISTATS 2020 Context Mover’s Distance & Barycenters: Optimal Transport of Contexts for Building Representations Sidak Pal Singh, Andreas Hug, Aymeric Dieuleveut, Martin Jaggi
ICLR 2020 Decentralized Deep Learning with Arbitrary Communication Compression Anastasia Koloskova, Tao Lin, Sebastian U. Stich, Martin Jaggi
ICLR 2020 Don't Use Large Mini-Batches, Use Local SGD Tao Lin, Sebastian U. Stich, Kumar Kshitij Patel, Martin Jaggi
ICLR 2020 Dynamic Model Pruning with Feedback Tao Lin, Sebastian U. Stich, Luis Barba, Daniil Dmitriev, Martin Jaggi
NeurIPS 2020 Ensemble Distillation for Robust Model Fusion in Federated Learning Tao Lin, Lingjing Kong, Sebastian U Stich, Martin Jaggi
ICLR 2020 Evaluating the Search Phase of Neural Architecture Search Kaicheng Yu, Christian Sciuto, Martin Jaggi, Claudiu Musat, Mathieu Salzmann
ICML 2020 Extrapolation for Large-Batch Training in Deep Learning Tao Lin, Lingjing Kong, Sebastian Stich, Martin Jaggi
AISTATS 2020 Linearly Convergent Frank-Wolfe with Backtracking Line-Search Fabian Pedregosa, Geoffrey Negiar, Armin Askari, Martin Jaggi
NeurIPS 2020 Model Fusion via Optimal Transport Sidak Pal Singh, Martin Jaggi
ICLR 2020 On the Relationship Between Self-Attention and Convolutional Layers Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi
ICML 2020 Optimizer Benchmarking Needs to Account for Hyperparameter Tuning Prabhu Teja Sivaprasad, Florian Mai, Thijs Vogels, Martin Jaggi, François Fleuret
NeurIPS 2020 Practical Low-Rank Communication Compression in Decentralized Deep Learning Thijs Vogels, Sai Praneeth Karimireddy, Martin Jaggi
ICLRW 2019 Context Mover's Distance & Barycenters: Optimal Transport of Contexts for Building Representations Sidak Pal Singh, Andreas Hug, Aymeric Dieuleveut, Martin Jaggi
ICML 2019 Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication Anastasia Koloskova, Sebastian Stich, Martin Jaggi
AISTATS 2019 Efficient Greedy Coordinate Descent for Composite Problems Sai Praneeth Karimireddy, Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi
ICML 2019 Error Feedback Fixes SignSGD and Other Gradient Compression Schemes Sai Praneeth Karimireddy, Quentin Rebjock, Sebastian Stich, Martin Jaggi
ICML 2019 Overcoming Multi-Model Forgetting Yassine Benyahia, Kaicheng Yu, Kamil Bennani Smires, Martin Jaggi, Anthony C. Davison, Mathieu Salzmann, Claudiu Musat
NeurIPS 2019 PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization Thijs Vogels, Sai Praneeth Karimireddy, Martin Jaggi
NeurIPS 2019 Unsupervised Scalable Representation Learning for Multivariate Time Series Jean-Yves Franceschi, Aymeric Dieuleveut, Martin Jaggi
ICLRW 2019 Unsupervised Scalable Representation Learning for Multivariate Time Series Jean-Yves Franceschi, Aymeric Dieuleveut, Martin Jaggi
ICML 2018 A Distributed Second-Order Algorithm You Can Trust Celestine Duenner, Aurelien Lucchi, Matilde Gargiani, An Bian, Thomas Hofmann, Martin Jaggi
AISTATS 2018 Adaptive Balancing of Gradient and Update Computation Times Using Global Geometry and Approximate Subproblems Sai Praneeth Reddy Karimireddy, Sebastian U. Stich, Martin Jaggi
NeurIPS 2018 COLA: Decentralized Linear Learning Lie He, An Bian, Martin Jaggi
ICML 2018 On Matching Pursuit and Coordinate Descent Francesco Locatello, Anant Raj, Sai Praneeth Karimireddy, Gunnar Raetsch, Bernhard Schölkopf, Sebastian Stich, Martin Jaggi
NeurIPS 2018 Sparsified SGD with Memory Sebastian U Stich, Jean-Baptiste Cordonnier, Martin Jaggi
NeurIPS 2018 Training DNNs with Hybrid Block Floating Point Mario Drumond, Tao Lin, Martin Jaggi, Babak Falsafi
AISTATS 2017 A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe Francesco Locatello, Rajiv Khanna, Michael Tschannen, Martin Jaggi
ICML 2017 Approximate Steepest Coordinate Descent Sebastian U. Stich, Anant Raj, Martin Jaggi
NeurIPS 2017 Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems Celestine Dünner, Thomas Parnell, Martin Jaggi
AISTATS 2017 Faster Coordinate Descent via Adaptive Importance Sampling Dmytro Perekrestenko, Volkan Cevher, Martin Jaggi
NeurIPS 2017 Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees Francesco Locatello, Michael Tschannen, Gunnar Raetsch, Martin Jaggi
NeurIPS 2017 Safe Adaptive Importance Sampling Sebastian U Stich, Anant Raj, Martin Jaggi
ICML 2016 Primal-Dual Rates and Certificates Celestine Dünner, Simone Forte, Martin Takac, Martin Jaggi
ICML 2015 Adding vs. Averaging in Distributed Primal-Dual Optimization Chenxin Ma, Virginia Smith, Martin Jaggi, Michael Jordan, Peter Richtarik, Martin Takac
NeurIPS 2015 On the Global Linear Convergence of Frank-Wolfe Optimization Variants Simon Lacoste-Julien, Martin Jaggi
NeurIPS 2014 Communication-Efficient Distributed Dual Coordinate Ascent Martin Jaggi, Virginia Smith, Martin Takac, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, Michael I Jordan
ICML 2013 Block-Coordinate Frank-Wolfe Optimization for Structural SVMs Simon Lacoste-Julien, Martin Jaggi, Mark Schmidt, Patrick Pletscher
ICML 2013 Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization Martin Jaggi
AISTATS 2012 Regularization Paths with Guarantees for Convex Semidefinite Optimization Joachim Giesen, Martin Jaggi, Soeren Laue
ICML 2010 A Simple Algorithm for Nuclear Norm Regularized Problems Martin Jaggi, Marek Sulovský