Baek, David D.

4 publications

ICLR 2026 Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth Jiawei Zhang, Andrew Estornell, David D. Baek, Bo Li, Xiaojun Xu
TMLR 2025 Harmonic Loss Trains Interpretable AI Models David D. Baek, Ziming Liu, Riya Tyagi, Max Tegmark
NeurIPS 2025 Scaling Laws for Scalable Oversight Joshua Engels, David D. Baek, Subhash Kantamneni, Max Tegmark
ICLRW 2025 Towards Understanding Distilled Reasoning Models: A Representational Approach David D. Baek, Max Tegmark