Wang, Shijin
27 publications
ICLR
2026
ChemEval: A Multi-Level and Fine-Grained Chemical Capability Evaluation for Large Language Models
ICLR
2026
Fewer Battles, More Gain: An Information-Efficient Framework for Arena-Based LLM Evaluation
ICML
2025
CogMath: Assessing LLMs’ Authentic Mathematical Ability from a Human Cognitive Perspective
ICLR
2025
Evaluating Large Language Models Through Role-Guide and Self-Reflection: A Comparative Study
NeurIPS
2025
FACT: Mitigating Inconsistent Hallucinations in LLMs via Fact-Driven Alternating Code-Text Training
AAAI
2025
Multi-Perspective Consolidation Enhanced Cognitive Diagnosis via Conditional Diffusion Model