Paulo, Gonçalo

3 publications

ICLR 2026 Evaluating SAE Interpretability Without Generating Explanations Gonçalo Paulo, Nora Belrose
ICLR 2026 Sparse Autoencoders Trained on the Same Data Learn Different Features Gonçalo Paulo, Nora Belrose
AAAI 2025 Do Transformer Interpretability Methods Transfer to RNNs? Gonçalo Paulo, Thomas Marshall, Nora Belrose