Brown, Hannah

2 publications

AAAI 2025 Single Character Perturbations Break LLM Alignment Leon Lin, Hannah Brown, Kenji Kawaguchi, Michael Shieh
NeurIPSW 2023 AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments Yang Zhang, Yawei Li, Hannah Brown, Mina Rezaei, Bernd Bischl, Philip Torr, Ashkan Khakzar, Kenji Kawaguchi