Bhaskar, Adithya

4 publications

ICLR 2025 Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization Noam Razin, Sadhika Malladi, Adithya Bhaskar, Danqi Chen, Sanjeev Arora, Boris Hanin
NeurIPS 2024 Finding Transformer Circuits with Edge Pruning Adithya Bhaskar, Alexander Wettig, Dan Friedman, Danqi Chen
NeurIPSW 2024 Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization Noam Razin, Sadhika Malladi, Adithya Bhaskar, Danqi Chen, Sanjeev Arora, Boris Hanin
NeurIPSW 2024 Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization Noam Razin, Sadhika Malladi, Adithya Bhaskar, Danqi Chen, Sanjeev Arora, Boris Hanin