XMP-Font: Self-Supervised Cross-Modality Pre-Training for Few-Shot Font Generation

Abstract

Generating a new font library is a very labor-intensive and time-consuming job for glyph-rich scripts. Few-shot font generation is thus required, as it requires only a few glyph references without fine-tuning during test. Existing methods follow the style-content disentanglement paradigm, and expect novel fonts to be produced by combining the style codes of the reference glyphs and the content representations of the source. However, these few-shot font generation methods either fail to capture content-independent style representations, or employ localized component-wise style representations, which is insufficient to model many Chinese font styles that involve hyper-component features such as inter-component spacing and "connected-stroke". To resolve these drawbacks and make the style representations more reliable, we propose a self-supervised cross-modality pre-training strategy and a cross-modality transformer-based encoder that is conditioned jointly on the glyph image and the corresponding stroke labels. The cross-modality encoder is pre-trained in a self-supervised manner to allow effective capture of cross- and intra-modality correlations, which facilitates the content-style disentanglement and modeling style representations of all scales (stroke-level, components-level and character-level). The pre-trained encoder is then applied to the downstream font generation task without fine-tuning. Experimental comparisons of our method with state-of-the-art methods demonstrate our method successfully transfers styles of all scales. In addition, it only requires one reference glyph and achieves the lowest rate of bad cases in the few-shot font generation task (28% lower than the second best).

Cite

Text

Liu et al. "XMP-Font: Self-Supervised Cross-Modality Pre-Training for Few-Shot Font Generation." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00775

Markdown

[Liu et al. "XMP-Font: Self-Supervised Cross-Modality Pre-Training for Few-Shot Font Generation." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/liu2022cvpr-xmpfont/) doi:10.1109/CVPR52688.2022.00775

BibTeX

@inproceedings{liu2022cvpr-xmpfont,
  title     = {{XMP-Font: Self-Supervised Cross-Modality Pre-Training for Few-Shot Font Generation}},
  author    = {Liu, Wei and Liu, Fangyue and Ding, Fei and He, Qian and Yi, Zili},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {7905-7914},
  doi       = {10.1109/CVPR52688.2022.00775},
  url       = {https://mlanthology.org/cvpr/2022/liu2022cvpr-xmpfont/}
}