Cutting Through the Noise: Boosting LLM Performance on Math Word Problems
Abstract
Large Language Models (LLMs) excel at various tasks, including solving math word problems (MWPs), but struggle with real-world problems containing irrelevant information. To address this, we propose a prompting framework that generates adversarial variants of MWPs by adding irrelevant variables. We introduce a dataset, ProbleMathic, containing both adversarial and non-adversarial MWPs. Our experiments reveal that LLMs are susceptible to distraction by numerical noise, resulting in an average relative performance drop of ~26% on adversarial MWPs. To mitigate this, we fine-tune LLMs (Qwen-2, Mistral) on the adversarial samples from our dataset. Fine-tuning on adversarial training instances improves performance on adversarial MWPs by ~8%, indicating increased robustness to noise and improved ability to identify relevant data for reasoning. Finally, to assess the generalizability of our prompting framework, we introduce GSM-8K-Adv, an adversarial variant of the GSM-8K benchmark. LLMs continue to struggle when faced with adversarial information, reducing performance by up to 24%.
Cite
Text
Anantheswaran et al. "Cutting Through the Noise: Boosting LLM Performance on Math Word Problems." ICLR 2025 Workshops: LLM_Reason_and_Plan, 2025.Markdown
[Anantheswaran et al. "Cutting Through the Noise: Boosting LLM Performance on Math Word Problems." ICLR 2025 Workshops: LLM_Reason_and_Plan, 2025.](https://mlanthology.org/iclrw/2025/anantheswaran2025iclrw-cutting/)BibTeX
@inproceedings{anantheswaran2025iclrw-cutting,
title = {{Cutting Through the Noise: Boosting LLM Performance on Math Word Problems}},
author = {Anantheswaran, Ujjwala and Gupta, Himanshu and Scaria, Kevin and Verma, Shreyas and Baral, Chitta and Mishra, Swaroop},
booktitle = {ICLR 2025 Workshops: LLM_Reason_and_Plan},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/anantheswaran2025iclrw-cutting/}
}