Joshi, Nitish

7 publications

ICLR 2026 Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models Anirudh Bharadwaj, Chaitanya Malaviya, Nitish Joshi, Mark Yatskar
ICLR 2026 Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort Xinpeng Wang, Nitish Joshi, Barbara Plank, Rico Angell, He He
ICLR 2026 Monitoring Decomposition Attacks with Lightweight Sequential Monitors Chen Yueh-Han, Nitish Joshi, Yulin Chen, Maksym Andriushchenko, Rico Angell, He He
ICLRW 2025 Monitoring LLM Agents for Sequentially Contextual Harm Chen Yueh-Han, Nitish Joshi, Yulin Chen, He He, Rico Angell
ICLR 2025 Transformers Struggle to Learn to Search Abulhair Saparov, Srushti Ajay Pawar, Shreyas Pimpalgaonkar, Nitish Joshi, Richard Yuanzhe Pang, Vishakh Padmakumar, Mehran Kazemi, Najoung Kim, He He
TMLR 2024 Nuisances via Negativa: Adjusting for Spurious Correlations via Data Augmentation Aahlad Manas Puli, Nitish Joshi, Yoav Wald, He He, Rajesh Ranganath
NeurIPS 2023 Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples Abulhair Saparov, Richard Yuanzhe Pang, Vishakh Padmakumar, Nitish Joshi, Mehran Kazemi, Najoung Kim, He He