All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation

Abstract

Image coding for multi-task applications, catering to both human perception and machine vision, has been extensively investigated. Existing methods often rely on multiple task-specific encoder-decoder pairs, leading to high overhead of parameter and bitrate usage, or face challenges in multi-objective optimization under a unified representation, failing to achieve both performance and efficiency. To this end, we propose Multi-Path Aggregation (MPA) integrated into existing coding models for joint human-machine vision, unifying the feature representation with an all-in-one architecture. MPA employs a predictor to allocate latent features among task-specific paths based on feature importance varied across tasks, maximizing the utility of shared features while preserving task-specific features for subsequent refinement. Leveraging feature correlations, we develop a two-stage optimization strategy to alleviate multi-task performance degradation. Upon the reuse of shared features, as low as 1.89\% parameters are further augmented and fine-tuned for a specific task, which completely avoids extensive optimization of the entire model. Experimental results show that MPA achieves performance comparable to state-of-the-art methods in both task-specific and multi-objective optimization across human viewing and machine analysis tasks. Moreover, our all-in-one design supports seamless transitions between human- and machine-oriented reconstruction, enabling task-controllable interpretation without altering the unified model. Code is available at https://github.com/NJUVISION/MPA.

Cite

Text

Zhang et al. "All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation." Neural Information Processing Systems, 2024. doi:10.52202/079017-2283

Markdown

[Zhang et al. "All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/zhang2024neurips-allinone/) doi:10.52202/079017-2283

BibTeX

@inproceedings{zhang2024neurips-allinone,
  title     = {{All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation}},
  author    = {Zhang, Xu and Guo, Peiyao and Lu, Ming and Ma, Zhan},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2283},
  url       = {https://mlanthology.org/neurips/2024/zhang2024neurips-allinone/}
}