Abdelnabi, Sahar

13 publications

ICLR 2025 Can LLMs Separate Instructions from Data? and What Do We Even Mean by That? Egor Zverev, Sahar Abdelnabi, Soroush Tabesh, Mario Fritz, Christoph H. Lampert
NeurIPS 2025 Contextual Integrity in LLMs via Reasoning and Reinforcement Learning Guangchen Lan, Huseyin A Inan, Sahar Abdelnabi, Janardhan Kulkarni, Lukas Wutschitz, Reza Shokri, Christopher Brinton, Robert Sim
ICLRW 2025 Safety Is Essential for Responsible Open-Ended Systems Ivaxi Sheth, Jan Wehner, Sahar Abdelnabi, Ruta Binkyte, Mario Fritz
TMLR 2025 Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models Jan Wehner, Sahar Abdelnabi, Daniel Tan, David Krueger, Mario Fritz
NeurIPS 2025 The Hawthorne Effect in Reasoning Models: Evaluating and Steering Test Awareness Sahar Abdelnabi, Ahmed Salem
ICLRW 2024 Can LLMs Separate Instructions from Data? and What Do We Even Mean by That? Egor Zverev, Sahar Abdelnabi, Mario Fritz, Christoph H. Lampert
NeurIPS 2024 Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation Sahar Abdelnabi, Amr Gomaa, Sarath Sivaprasad, Lea Schönherr, Mario Fritz
NeurIPS 2024 Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition Edoardo Debenedetti, Javier Rando, Daniel Paleka, Fineas Silaghi, Dragos Albastroiu, Niv Cohen, Yuval Lemberg, Reshmi Ghosh, Rui Wen, Ahmed Salem, Giovanni Cherubin, Santiago Zanella-Beguelin, Robin Schmid, Victor Klemm, Takahiro Miki, Chenhao Li, Stefan Kraft, Mario Fritz, Florian Tramèr, Sahar Abdelnabi, Lea Schönherr
NeurIPSW 2024 Hypothesizing Missing Causal Variables with LLMs Ivaxi Sheth, Sahar Abdelnabi, Mario Fritz
ICMLW 2024 LLM Sample: Part Average and Part Ideal Sarath Sivaprasad, Pramod Kaushik, Sahar Abdelnabi, Mario Fritz
ICLRW 2024 LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Game Sahar Abdelnabi, Amr Gomaa, Sarath Sivaprasad, Lea Schönherr, Mario Fritz
CVPR 2022 Open-Domain, Content-Based, Multi-Modal Fact-Checking of Out-of-Context Images via Online Resources Sahar Abdelnabi, Rakibul Hasan, Mario Fritz
ICCV 2021 Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data Ning Yu, Vladislav Skripniuk, Sahar Abdelnabi, Mario Fritz