Baek, David D.

4 publications

ICLR 2026 Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth Jiawei Zhang, Andrew Estornell, David D. Baek, Bo Li, Xiaojun Xu

TMLR 2025 Harmonic Loss Trains Interpretable AI Models David D. Baek, Ziming Liu, Riya Tyagi, Max Tegmark

NeurIPS 2025 Scaling Laws for Scalable Oversight Joshua Engels, David D. Baek, Subhash Kantamneni, Max Tegmark

ICLRW 2025 Towards Understanding Distilled Reasoning Models: A Representational Approach David D. Baek, Max Tegmark