ML Anthology
Authors
Search
About
Petrova, Nora
2 publications
ICLRW
2025
Latent Adversarial Training Improves the Representation of Refusal
Alexandra Abbas
,
Nora Petrova
,
Hélios Lyons
,
Natalia Perez-Campanero
NeurIPSW
2024
Characterizing Stable Regions in the Residual Stream of LLMs
Jett Janiak
,
Jacek Karwowski
,
Chatrik Singh Mangat
,
Giorgi Giglemiani
,
Nora Petrova
,
Stefan Heimersheim