TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching

Abstract

Fine tuning has been regarded as a de facto approach for adapting large language models (LLMs) to downstream tasks, but the high training memory consumption inherited from LLMs makes this process inefficient. Among existing memory efficient approaches, activation-related optimization has proven particularly effective, as activations consistently dominate overall memory consumption. Although prior arts offer various activation optimization strategies, their data-agnostic nature ultimately results in ineffective and unstable fine tuning. In this paper, we propose TokenSeek, a universal plugin solution for various transformer-based models through instance-aware token seeking and ditching, achieving significant fine-tuning memory savings (e.g., requiring only 14.8% of the memory on Llama3.2 1B) with on-par or even better performance. Furthermore, our interpretable token seeking process reveals the underlying reasons for its effectiveness, offering valuable insights for future research on token efficiency. Homepage: runjia.tech/iclr_tokenseek.

Cite

Text

Zeng et al. "TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching." International Conference on Learning Representations, 2026.

Markdown

[Zeng et al. "TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zeng2026iclr-tokenseek/)

BibTeX

@inproceedings{zeng2026iclr-tokenseek,
  title     = {{TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching}},
  author    = {Zeng, Runjia and Wang, Qifan and Guan, Qiang and Tang, Ruixiang and Huang, Lifu and Wang, Zhenting and Zhang, Xueling and Han, Cheng and Liu, Dongfang},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zeng2026iclr-tokenseek/}
}