Angell, Rico

9 publications

ICLR 2026 Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort Xinpeng Wang, Nitish Joshi, Barbara Plank, Rico Angell, He He

ICLR 2026 Jailbreak Transferability Emerges from Shared Representations Rico Angell, Jannik Brinkmann, He He

ICLR 2026 Monitoring Decomposition Attacks with Lightweight Sequential Monitors Chen Yueh-Han, Nitish Joshi, Yulin Chen, Maksym Andriushchenko, Rico Angell, He He

ICLRW 2025 Monitoring LLM Agents for Sequentially Contextual Harm Chen Yueh-Han, Nitish Joshi, Yulin Chen, He He, Rico Angell

ICML 2024 Fast, Scalable, Warm-Start Semidefinite Programming with Spectral Bundling and Sketching Rico Angell, Andrew Mccallum

NeurIPS 2024 Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models Adam Karvonen, Benjamin Wright, Can Rager, Rico Angell, Jannik Brinkmann, Logan Smith, Claudio Mayrink Verdun, David Bau, Samuel Marks

ICMLW 2024 Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models Adam Karvonen, Benjamin Wright, Can Rager, Rico Angell, Jannik Brinkmann, Logan Riggs Smith, Claudio Mayrink Verdun, David Bau, Samuel Marks

ICML 2022 Interactive Correlation Clustering with Existential Cluster Constraints Rico Angell, Nicholas Monath, Nishant Yadav, Andrew Mccallum

NeurIPS 2018 Inferring Latent Velocities from Weather Radar Data Using Gaussian Processes Rico Angell, Daniel R. Sheldon