ML Anthology
Authors
Search
About
Joshi, Nitish
7 publications
ICLR
2026
Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models
Anirudh Bharadwaj
,
Chaitanya Malaviya
,
Nitish Joshi
,
Mark Yatskar
ICLR
2026
Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
Xinpeng Wang
,
Nitish Joshi
,
Barbara Plank
,
Rico Angell
,
He He
ICLR
2026
Monitoring Decomposition Attacks with Lightweight Sequential Monitors
Chen Yueh-Han
,
Nitish Joshi
,
Yulin Chen
,
Maksym Andriushchenko
,
Rico Angell
,
He He
ICLRW
2025
Monitoring LLM Agents for Sequentially Contextual Harm
Chen Yueh-Han
,
Nitish Joshi
,
Yulin Chen
,
He He
,
Rico Angell
ICLR
2025
Transformers Struggle to Learn to Search
Abulhair Saparov
,
Srushti Ajay Pawar
,
Shreyas Pimpalgaonkar
,
Nitish Joshi
,
Richard Yuanzhe Pang
,
Vishakh Padmakumar
,
Mehran Kazemi
,
Najoung Kim
,
He He
TMLR
2024
Nuisances via Negativa: Adjusting for Spurious Correlations via Data Augmentation
Aahlad Manas Puli
,
Nitish Joshi
,
Yoav Wald
,
He He
,
Rajesh Ranganath
NeurIPS
2023
Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples
Abulhair Saparov
,
Richard Yuanzhe Pang
,
Vishakh Padmakumar
,
Nitish Joshi
,
Mehran Kazemi
,
Najoung Kim
,
He He