Sondej, Filip

3 publications

AAAI 2025 Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems Pierre Peigné, Mikolaj Kniejski, Filip Sondej, Matthieu David, Jason Hoelscher-Obermaier, Christian Schröder de Witt, Esben Kran
NeurIPSW 2024 Ablation Is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction Yushi Yang, Filip Sondej, Harry Mayne, Adam Mahdi
NeurIPSW 2024 Toxic Neurons Aren’t Enough to Explain DPO: A Mechanistic Analysis for Toxicity Reduction Yushi Yang, Filip Sondej, Harry Mayne, Adam Mahdi