ML Anthology
Authors
Search
About
Obeso, Oscar Balcells
2 publications
ICLR
2025
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Javier Ferrando
,
Oscar Balcells Obeso
,
Senthooran Rajamanoharan
,
Neel Nanda
ICMLW
2024
Refusal in Language Models Is Mediated by a Single Direction
Andy Arditi
,
Oscar Balcells Obeso
,
Aaquib Syed
,
Daniel Paleka
,
Nina Panickssery
,
Wes Gurnee
,
Neel Nanda