STEPS: Sequential Probability Tensor Estimation for Text-to-Image Hard Prompt Search

Abstract

Recent text-to-image (T2I) diffusion models have demonstrated remarkable capabilities in visual synthesis, yet their performance heavily relies on the quality of input prompts. However, optimizing discrete prompts remains challenging because the discrete nature of tokens prevents the direct application of the gradient descent method and the vast search space of possible token combinations. As a result, existing approaches either suffer from quantization errors when employing continuous optimization techniques or become trapped in local optima due to coordinate-wise greedy search. In this paper, we propose STEPS, a novel Sequential probability Tensor Estimation approach for hard Prompt Search. Our method reformulates discrete prompt optimization as a sequential probability tensor estimation problem, leveraging the inherent low-rank characteristics to address the curse of dimensionality. To further improve the computational efficiency, we develop a memory-bounded sampling approach that shrinks the prompt space without the iteration step dependency while preserving sequential optimization dynamics. Extensive experiments on various public datasets demonstrate that our method consistently outperforms existing approaches in T2I generation, cross-model prompt transferability, and harmful prompt optimization, validating the effectiveness of the proposed framework.

Cite

Text

Qiu et al. "STEPS: Sequential Probability Tensor Estimation for Text-to-Image Hard Prompt Search." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.02667

Markdown

[Qiu et al. "STEPS: Sequential Probability Tensor Estimation for Text-to-Image Hard Prompt Search." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/qiu2025cvpr-steps/) doi:10.1109/CVPR52734.2025.02667

BibTeX

@inproceedings{qiu2025cvpr-steps,
  title     = {{STEPS: Sequential Probability Tensor Estimation for Text-to-Image Hard Prompt Search}},
  author    = {Qiu, Yuning and Wang, Andong and Li, Chao and Huang, Haonan and Zhou, Guoxu and Zhao, Qibin},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {28640-28650},
  doi       = {10.1109/CVPR52734.2025.02667},
  url       = {https://mlanthology.org/cvpr/2025/qiu2025cvpr-steps/}
}