ShortListing Model: A Streamlined Simplex Diffusion for Discrete Variable Generation

Abstract

Generative modeling of discrete variables is challenging yet crucial for applications in natural language processing and biological sequence design. We introduce the Shortlisting Model (SLM), a novel simplex-based diffusion model inspired by progressive candidate pruning. SLM operates on simplex centroids, reducing generation complexity and enhancing scalability. Additionally, SLM incorporates a flexible implementation of classifier-free guidance, enhancing unconditional generation performance. Extensive experiments on DNA promoter and enhancer design, protein design, character-level and large-vocabulary language modeling demonstrate the competitive performance and strong potential of SLM. Our code can be found at https://github.com/GenSI-THUAIR/SLM.

Cite

Text

Song et al. "ShortListing Model: A Streamlined Simplex Diffusion for Discrete Variable Generation." Advances in Neural Information Processing Systems, 2025.

Markdown

[Song et al. "ShortListing Model: A Streamlined Simplex Diffusion for Discrete Variable Generation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/song2025neurips-shortlisting/)

BibTeX

@inproceedings{song2025neurips-shortlisting,
  title     = {{ShortListing Model: A Streamlined Simplex Diffusion for Discrete Variable Generation}},
  author    = {Song, Yuxuan and Zhang, Zhe and Pei, Yu and Gong, Jingjing and Yu, Qiying and Zhang, Zheng and Wang, Mingxuan and Zhou, Hao and Liu, Jingjing and Ma, Wei-Ying},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/song2025neurips-shortlisting/}
}