Nishikawa, Naoki
8 publications
NeurIPS
2025
Degrees of Freedom for Linear Attention: Distilling SoftMax Attention with Optimal Feature Efficiency
NeurIPS
2025
From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
ICML
2025
Mixture of Experts Provably Detect and Learn the Latent Cluster Structure in Gradient-Based Learning
ICMLW
2024
State Space Models Are Comparable to Transformers in Estimating Functions with Dynamic Smoothness