Patel, Oam

3 publications

TMLR 2025 Defending Against Unforeseen Failure Modes with Latent Adversarial Training Stephen Casper, Lennart Schulze, Oam Patel, Dylan Hadfield-Menell
ICLRW 2024 Preventing Memorized Completions Through White-Box Filtering Oam Patel, Rowan Wang
NeurIPS 2023 Inference-Time Intervention: Eliciting Truthful Answers from a Language Model Kenneth Li, Oam Patel, Fernanda ViƩgas, Hanspeter Pfister, Martin Wattenberg