Zhou, Yilun

14 publications

ICLR 2026 DeepTRACE: Auditing Deep Research AI Systems for Tracking Reliability Across Citations and Evidence Pranav Narayanan Venkit, Philippe Laban, Yilun Zhou, Kung-Hsiang Huang, Yixin Mao, Chien-Sheng Wu
ICLR 2026 Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains Austin Xu, Xuan-Phi Nguyen, Yilun Zhou, Chien-Sheng Wu, Caiming Xiong, Shafiq Joty
ICLR 2026 On the Shelf Life of Fine-Tuned LLM-Judges: Future-Proofing, Backward-Compatibility, and Question Generalization Janvijay Singh, Austin Xu, Yilun Zhou, Yefan Zhou, Dilek Hakkani-Tür, Shafiq Joty
ICLR 2026 Variation in Verification: Understanding Verification Dynamics in Large Language Models Yefan Zhou, Austin Xu, Yilun Zhou, Janvijay Singh, Jiang Gui, Shafiq Joty
ICLR 2025 BingoGuard: LLM Content Moderation Tools with Risk Levels Fan Yin, Philippe Laban, Xiangyu Peng, Yilun Zhou, Yixin Mao, Vaibhav Vats, Linnea Ross, Divyansh Agarwal, Caiming Xiong, Chien-Sheng Wu
ICML 2025 Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators Yilun Zhou, Austin Xu, Peifeng Wang, Caiming Xiong, Shafiq Joty
TMLR 2025 Shared Imagination: LLMs Hallucinate Alike Yilun Zhou, Caiming Xiong, Silvio Savarese, Chien-Sheng Wu
NeurIPSW 2023 CHAMP: A Competition-Level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities Yujun Mao, Yoon Kim, Yilun Zhou
AAAI 2023 Explaining Large Language Model-Based Neural Semantic Parsers (Student Abstract) Daking Rai, Yilun Zhou, Bailin Wang, Ziyu Yao
AAAI 2022 Do Feature Attribution Methods Correctly Attribute Features? Yilun Zhou, Serena Booth, Marco Túlio Ribeiro, Julie Shah
AISTATS 2021 Towards Understanding the Behaviors of Optimal Deep Active Learning Algorithms Yilun Zhou, Adithya Renduchintala, Xian Li, Sida Wang, Yashar Mehdad, Asish Ghoshal
AAAI 2021 Bayes-TrEx: A Bayesian Sampling Approach to Model Transparency by Example Serena Booth, Yilun Zhou, Ankit Shah, Julie Shah
NeurIPSW 2021 Do Feature Attribution Methods Correctly Attribute Features? Yilun Zhou, Serena Booth, Marco Tulio Ribeiro, Julie Shah
CoRL 2021 RoCUS: Robot Controller Understanding via Sampling Yilun Zhou, Serena Booth, Nadia Figueroa, Julie Shah