Messmer, Bettina

8 publications

NeurIPS 2025 Enhancing Multilingual LLM Pretraining with Model-Based Data Selection Bettina Messmer, Vinko Sabolčec, Martin Jaggi
ICLRW 2025 Enhancing Multilingual LLM Pretraining with Model-Based Data Selection Bettina Messmer, Vinko Sabolčec, Martin Jaggi
ICML 2025 On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists Dongyang Fan, Bettina Messmer, Nikita Doikov, Martin Jaggi
ICLRW 2025 On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists Dongyang Fan, Bettina Messmer, Nikita Doikov, Martin Jaggi
ICMLW 2024 Analyzing & Eliminating Learning Rate Warmup in GPT Pre-Training Atli Kosson, Bettina Messmer, Martin Jaggi
NeurIPS 2024 Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training Atli Kosson, Bettina Messmer, Martin Jaggi
ICML 2024 Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks Atli Kosson, Bettina Messmer, Martin Jaggi
NeurIPSW 2023 Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks Atli Kosson, Bettina Messmer, Martin Jaggi