Rationalization Models for Text-to-SQL

Abstract

We introduce a framework for generating Chain-of-Thought (CoT) rationales to enhance text-to-SQL model fine-tuning. These rationales consist of intermediate SQL statements and explanations, serving as incremental steps toward constructing the final SQL query. The process begins with manually annotating a small set of examples, which are then used to prompt a large language model in an iterative, dynamic few-shot knowledge distillation procedure from a teacher model. A rationalization model is subsequently trained on the validated decomposed queries, enabling extensive synthetic CoT annotations for text-to-SQL datasets. To evaluate the approach, we fine-tune small language models with and without these rationales on the BIRD dataset. Results indicate that step-by-step query generation improves execution accuracy, especially for moderately and highly complex queries, while also enhancing explainability.

Cite

Text

Rossiello et al. "Rationalization Models for Text-to-SQL." ICLR 2025 Workshops: LLM_Reason_and_Plan, 2025.

Markdown

[Rossiello et al. "Rationalization Models for Text-to-SQL." ICLR 2025 Workshops: LLM_Reason_and_Plan, 2025.](https://mlanthology.org/iclrw/2025/rossiello2025iclrw-rationalization/)

BibTeX

@inproceedings{rossiello2025iclrw-rationalization,
  title     = {{Rationalization Models for Text-to-SQL}},
  author    = {Rossiello, Gaetano and Pham, Nhan H and Glass, Michael and Lee, Junkyu and Subramanian, Dharmashankar},
  booktitle = {ICLR 2025 Workshops: LLM_Reason_and_Plan},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/rossiello2025iclrw-rationalization/}
}