Patel, Oam

4 publications

ICLR 2026 Priors in Time: Missing Inductive Biases for Language Model Interpretability Ekdeep Singh Lubana, Can Rager, Sai Sumedh R. Hindupur, Valérie Costa, Oam Patel, Sonia Krishna Murthy, Thomas Fel, Greta Tuckute, Daniel Wurgaft, Eric Bigelow, Demba E. Ba, Melanie Weber, Aaron Mueller
TMLR 2025 Defending Against Unforeseen Failure Modes with Latent Adversarial Training Stephen Casper, Lennart Schulze, Oam Patel, Dylan Hadfield-Menell
ICLRW 2024 Preventing Memorized Completions Through White-Box Filtering Oam Patel, Rowan Wang
NeurIPS 2023 Inference-Time Intervention: Eliciting Truthful Answers from a Language Model Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg