Mwigo, Brian

1 publications

TMLR 2026 Generalization Bound for a Shallow Transformer Trained Using Gradient Descent Brian Mwigo, Anirban Dasgupta