InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression

Ye, Haotian; He, Qiyuan; Han, Jiaqi; Li, Puheng; Fan, Jiaojiao; Hao, Zekun; Reda, Fitsum; Balaji, Yogesh; Chen, Huayu; Liu, Sheng; Yao, Angela; Zou, James; Ermon, Stefano; Wang, Haoxiang; Liu, Ming-Yu

InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression

Haotian Ye, Qiyuan He, Jiaqi Han, Puheng Li, Jiaojiao Fan, Zekun Hao, Fitsum Reda, Yogesh Balaji, Huayu Chen, Sheng Liu, Angela Yao, James Zou, Stefano Ermon, Haoxiang Wang, Ming-Yu Liu

ICLR 2026

/iclr/2026/ye2026iclr-infotok/

Abstract

Accurate and efficient discrete video tokenization is essential for long video sequences processing. Yet, the inherent complexity and variable information density of videos present a significant bottleneck for current tokenizers, which rigidly compress all content at a fixed rate, leading to redundancy or information loss. Drawing inspiration from Shannon's information theory, this paper introduces \alg, a principled framework for adaptive video tokenization. We rigorously prove that existing data-agnostic training methods are suboptimal in representation length, and present a novel evidence lower bound (ELBO)-based algorithm that approaches theoretical optimality. Leveraging this framework, we develop a transformer-based adaptive compressor that enables adaptive tokenization. Empirical results demonstrate state-of-the-art compression performance, saving $20\%$ tokens without influence on performance, and achieving $2.3\times$ compression rates while still outperforming prior heuristic adaptive approaches. By allocating tokens according to informational richness, \alg enables a more compressed yet accurate tokenization for video representation, offering valuable insights for future research.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Ye et al. "InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression." International Conference on Learning Representations, 2026.

Markdown

[Ye et al. "InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/ye2026iclr-infotok/)

BibTeX

@inproceedings{ye2026iclr-infotok,
  title     = {{InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression}},
  author    = {Ye, Haotian and He, Qiyuan and Han, Jiaqi and Li, Puheng and Fan, Jiaojiao and Hao, Zekun and Reda, Fitsum and Balaji, Yogesh and Chen, Huayu and Liu, Sheng and Yao, Angela and Zou, James and Ermon, Stefano and Wang, Haoxiang and Liu, Ming-Yu},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/ye2026iclr-infotok/}
}