Towards Sequence Modeling Alignment Between Tokenizer and Autoregressive Model

Abstract

Autoregressive image generation aims to predict the next token based on previous ones. However, this process is challenged by the bidirectional dependencies inherent in conventional image tokenizations, which creates a fundamental misalignment with the unidirectional nature of autoregressive models. To resolve this, we introduce AliTok, a novel Aligned Tokenizer that alters the dependency structure of the token sequence. AliTok employs a bidirectional encoder constrained by a causal decoder, a design that compels the encoder to produce a token sequence with both semantic richness and forward-dependency. Furthermore, by incorporating prefix tokens and employing a two-stage tokenizer training process to enhance reconstruction performance, AliTok achieves high fidelity and predictability simultaneously. Building upon AliTok, a standard decoder-only autoregressive model with just 177M parameters achieves a gFID of 1.44 and an IS of 319.5 on ImageNet-256. Scaling to 662M, our model reaches a gFID of 1.28, surpassing the SOTA diffusion method with 10x faster sampling. On ImageNet-512, our 318M model also achieves a SOTA gFID of 1.39. Code and weights at https://github.com/ali-vilab/alitok.

Cite

Text

Wu et al. "Towards Sequence Modeling Alignment Between Tokenizer and Autoregressive Model." International Conference on Learning Representations, 2026.

Markdown

[Wu et al. "Towards Sequence Modeling Alignment Between Tokenizer and Autoregressive Model." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wu2026iclr-sequence/)

BibTeX

@inproceedings{wu2026iclr-sequence,
  title     = {{Towards Sequence Modeling Alignment Between Tokenizer and Autoregressive Model}},
  author    = {Wu, Pingyu and Zhu, Kai and Liu, Yu and Tang, Longxiang and Yang, Jian and Peng, Yansong and Zhai, Wei and Cao, Yang and Zha, Zheng-Jun},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/wu2026iclr-sequence/}
}