Obeso, Oscar Balcells

2 publications

ICLR 2025 Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models Javier Ferrando, Oscar Balcells Obeso, Senthooran Rajamanoharan, Neel Nanda
ICMLW 2024 Refusal in Language Models Is Mediated by a Single Direction Andy Arditi, Oscar Balcells Obeso, Aaquib Syed, Daniel Paleka, Nina Panickssery, Wes Gurnee, Neel Nanda