FonTS: Text Rendering with Typography and Style Controls

Abstract

Visual text rendering are widespread in various real-world applications, requiring careful font selection and typographic choices. Recent progress in diffusion transformer (DiT)-based text-to-image (T2I) models show promise in automating these processes. However, these methods still encounter challenges like inconsistent fonts, style variation, and limited fine-grained control, particularly at the word-level. This paper proposes a two-stage DiT-based pipeline to address these problems by enhancing controllability over typography and style in text rendering. We introduce typography control fine-tuning (TC-FT), an parameter-efficient fine-tuning method (on 5% key parameters) with enclosing typography control tokens (ETC-tokens), which enables precise word-level application of typographic features. To further address style inconsistency in text rendering, we propose a text-agnostic style control adapter (SCA) that prevents content leakage while enhancing style consistency. To implement TC-FT and SCA effectively, we incorporated HTML-render into the data synthesis pipeline and proposed the first word-level controllable dataset. Through comprehensive experiments, we demonstrate the effectiveness of our approach in achieving superior word-level typographic control, font consistency, and style consistency in text rendering tasks. Our project page is available at this site.

Cite

Text

Shi et al. "FonTS: Text Rendering with Typography and Style Controls." International Conference on Computer Vision, 2025.

Markdown

[Shi et al. "FonTS: Text Rendering with Typography and Style Controls." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/shi2025iccv-fonts/)

BibTeX

@inproceedings{shi2025iccv-fonts,
  title     = {{FonTS: Text Rendering with Typography and Style Controls}},
  author    = {Shi, Wenda and Song, Yiren and Zhang, Dengming and Liu, Jiaming and Zou, Xingxing},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {18463-18474},
  url       = {https://mlanthology.org/iccv/2025/shi2025iccv-fonts/}
}