Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning

Abstract

Large language models have shown great potential in complex reasoning tasks, yet their performance is often hampered by the scarcity of high-quality and reasoning-focused training datasets. Addressing this challenge, we propose Key-PointDriven Data Synthesis (KPDDS), a novel data synthesis framework that synthesizes question-answer pairs by leveraging key points and exemplar practices from authentic data sources. KPDDS ensures the generation of novel questions with rigorous quality control and substantial scalability. As a result, we present KPMath, an extensive synthetic dataset tailored for mathematical reasoning, comprising over 800K questionanswer pairs. Utilizing KPMath and augmenting it with additional reasoning-intensive corpora, we create the comprehensive KPMath-Plus dataset. Our experiments demonstrate that this dataset can enhance the mathematical reasoning performance of models across various architectures and sizes. The Qwen1.5-72B model, fine-tuned on KPMath-Plus, achieves 87.0% accuracy on GSM8K and 58.3% on MATH, surpassing competitors in the 7B to 72B range and best commercial models like GPT-4 across multiple math reasoning datasets.

Cite

Text

Huang et al. "Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I23.34593

Markdown

[Huang et al. "Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/huang2025aaai-key/) doi:10.1609/AAAI.V39I23.34593

BibTeX

@inproceedings{huang2025aaai-key,
  title     = {{Key-Point-Driven Data Synthesis with Its Enhancement on Mathematical Reasoning}},
  author    = {Huang, Yiming and Liu, Xiao and Gong, Yeyun and Gou, Zhibin and Shen, Yelong and Duan, Nan and Chen, Weizhu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {24176-24184},
  doi       = {10.1609/AAAI.V39I23.34593},
  url       = {https://mlanthology.org/aaai/2025/huang2025aaai-key/}
}