SEP: A General Lossless Compression Framework with Semantics Enhancement and Multi-Stream Pipelines

Abstract

Deep-learning-based lossless compression is of immense importance in real-world applications, such as cold data persistence, sensor data collection, and astronomical data transmission. However, existing compressors typically model data using single-byte symbols as tokens, which makes it hard to capture the inherent correlations and cannot effectively utilize the parallel capabilities of GPU and multi-core CPU. This paper proposes SEP, a novel lossless compression framework for most time-series backbone neural networks. We first introduce a semantic enhancement module to capture the complex intra-patch relationships of binary byte streams. To improve the compression speed, we design multi-stream pipelines that dynamically assign parallel tasks to GPU streams and multi-cores. We further propose a novel GPU memory optimization strategy, which reuses GPU memory by a shared pool across streams. We conduct experiments on seven real-world datasets and the results demonstrate that our SEP framework outperforms state-of-the-art compressors with an average speed improvement of 30.0% and an average compression ratio gain of 5.1%, which is further elevated to 7.6% with the use of pre-training models. The GPU memory footprint is reduced by as high as 63.1% and by an average of 36.2%. The source code is available at: https://github.com/damonwan1/SEP.

Cite

Text

Wan et al. "SEP: A General Lossless Compression Framework with Semantics Enhancement and Multi-Stream Pipelines." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/370

Markdown

[Wan et al. "SEP: A General Lossless Compression Framework with Semantics Enhancement and Multi-Stream Pipelines." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/wan2025ijcai-sep/) doi:10.24963/IJCAI.2025/370

BibTeX

@inproceedings{wan2025ijcai-sep,
  title     = {{SEP: A General Lossless Compression Framework with Semantics Enhancement and Multi-Stream Pipelines}},
  author    = {Wan, Meng and Cao, Rongqiang and Li, Yanghao and Wang, Jue and Wang, Zijian and Su, Qi and Qiu, Lei and Shi, Peng and Wang, Yangang and Li, Chong},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {3326-3334},
  doi       = {10.24963/IJCAI.2025/370},
  url       = {https://mlanthology.org/ijcai/2025/wan2025ijcai-sep/}
}