ML Anthology
Authors
Search
About
Sondej, Filip
3 publications
AAAI
2025
Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems
Pierre Peigné
,
Mikolaj Kniejski
,
Filip Sondej
,
Matthieu David
,
Jason Hoelscher-Obermaier
,
Christian Schröder de Witt
,
Esben Kran
NeurIPSW
2024
Ablation Is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction
Yushi Yang
,
Filip Sondej
,
Harry Mayne
,
Adam Mahdi
NeurIPSW
2024
Toxic Neurons Aren’t Enough to Explain DPO: A Mechanistic Analysis for Toxicity Reduction
Yushi Yang
,
Filip Sondej
,
Harry Mayne
,
Adam Mahdi