Wang, Yibo
28 publications
NeurIPS
2025
Mulberry: Empowering MLLM with O1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
NeurIPS
2025
Panacea: Mitigating Harmful Fine-Tuning for Large Language Models via Post-Fine-Tuning Perturbation
NeurIPS
2025
R1-ShareVL: Incentivizing Reasoning Capabilities of Multimodal Large Language Models via Share-GRPO
NeurIPS
2025
SPACE: Noise Contrastive Estimation Stabilizes Self-Play Fine-Tuning for Large Language Models
NeurIPS
2025
Triplets Better than Pairs: Towards Stable and Effective Self-Play Fine-Tuning for LLMs
NeurIPS
2024
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees
AAAI
2024
Non-Stationary Projection-Free Online Learning with Dynamic and Adaptive Regret Guarantees