Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs

Abstract

Boolean Satisfiability (SAT) solvers are foundational to computer science, yet their performance typically hinges on hand-crafted heuristics. This work introduces Reinforcement Learning from Algorithm Feedback (RLAF) as a paradigm for learning to guide SAT solver branching heuristics with Graph Neural Networks (GNNs). Central to our approach is a novel and generic mechanism for injecting inferred variable weights and polarities into the branching heuristics of existing SAT solvers. In a single forward pass, a GNN assigns these parameters to all variables. Casting this one-shot guidance as a reinforcement learning problem lets us train the GNN with off-the-shelf policy-gradient methods, such as GRPO, directly using the solver's computational cost as the sole reward signal. Extensive evaluations demonstrate that RLAF-trained policies significantly reduce the mean solve times of different base solvers across diverse SAT problem distributions, achieving more than a 2x speedup in some cases, while generalizing effectively to larger and harder problems after training. Notably, these policies consistently outperform expert-supervised approaches based on learning handcrafted weighting heuristics, offering a promising path towards data-driven heuristic design in combinatorial optimization.

Cite

Text

Tönshoff and Grohe. "Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs." International Conference on Learning Representations, 2026.

Markdown

[Tönshoff and Grohe. "Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/tonshoff2026iclr-learning/)

BibTeX

@inproceedings{tonshoff2026iclr-learning,
  title     = {{Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs}},
  author    = {Tönshoff, Jan and Grohe, Martin},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/tonshoff2026iclr-learning/}
}