Petrova, Nora

2 publications

ICLRW 2025 Latent Adversarial Training Improves the Representation of Refusal Alexandra Abbas, Nora Petrova, Hélios Lyons, Natalia Perez-Campanero
NeurIPSW 2024 Characterizing Stable Regions in the Residual Stream of LLMs Jett Janiak, Jacek Karwowski, Chatrik Singh Mangat, Giorgi Giglemiani, Nora Petrova, Stefan Heimersheim