SKE-Layout: Spatial Knowledge Enhanced Layout Generation with LLMs

Abstract

Generating layouts from textual descriptions by large language models (LLMs) plays a crucial role in precise spatial reasoning-induced domains such as robotic object rearrangement and text-to-image generation. However, current methods face challenges in limited real-world examples, handling diverse layout descriptions and varying levels of granularity. To address these issues, a novel framework named Spatial Knowledge Enhanced Layout (SKE-Layout), is introduced. SKE-Layout integrates mixed spatial knowledge sources, leveraging both real and synthetic data to enhance spatial contexts. It utilizes diverse representations tailored to specific tasks and employs contrastive learning and multitask learning techniques for accurate spatial knowledge retrieval. This framework generates more accurate and fine-grained visual layouts for object rearrangement and text-to-image generation tasks, achieving improvements of 5%-30% compared to existing methods.

Cite

Text

Wang et al. "SKE-Layout: Spatial Knowledge Enhanced Layout Generation with LLMs." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01808

Markdown

[Wang et al. "SKE-Layout: Spatial Knowledge Enhanced Layout Generation with LLMs." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/wang2025cvpr-skelayout/) doi:10.1109/CVPR52734.2025.01808

BibTeX

@inproceedings{wang2025cvpr-skelayout,
  title     = {{SKE-Layout: Spatial Knowledge Enhanced Layout Generation with LLMs}},
  author    = {Wang, Junsheng and Cao, Nieqing and Ding, Yan and Xie, Mengying and Gu, Fuqiang and Chen, Chao},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {19414-19423},
  doi       = {10.1109/CVPR52734.2025.01808},
  url       = {https://mlanthology.org/cvpr/2025/wang2025cvpr-skelayout/}
}