Thilak, Vimal
8 publications
ICML
2025
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
ICLRW
2025
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
NeurIPSW
2024
Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning