TY - GEN
T1 - LL-ICM
T2 - 2025 Data Compression Conference, DCC 2025
AU - Xue, Yuan
AU - Zhang, Qi
AU - Jia, Chuanmin
AU - Wang, Shiqi
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Image Compression for Machines (ICM) aims to compress images for machine vision tasks, while current methods mostly focus on the demands for high-level tasks. However, the quality of original images is usually not guaranteed in the real world, leading to even worse downstream task performance after compression. Thus, lowlevel (LL) restoration tasks should also be considered in ICM. In this paper, we propose the first ICM framework for LL machine vision tasks, namely LL-ICM, which optimizes the compression and LL processing performance simultaneously. Moreover, LL-ICM leverages large vision-language model (VLM) to solve different LL task within a single model, which is particularly useful when the distortion type of the original image is uncertain. As illustrated in Fig. 1(a), LL-ICM consists of a neural image codec and a VLM-based LL processing module. Given an original image with distortions, LL-ICM firstly compress it as X. Then, we extract a generalized feature F from X , which is then encoded as two representations, distortion type f and caption s. After that, the LL processing module receives X and its representations to generate the restored version of X, i.e., XH.
AB - Image Compression for Machines (ICM) aims to compress images for machine vision tasks, while current methods mostly focus on the demands for high-level tasks. However, the quality of original images is usually not guaranteed in the real world, leading to even worse downstream task performance after compression. Thus, lowlevel (LL) restoration tasks should also be considered in ICM. In this paper, we propose the first ICM framework for LL machine vision tasks, namely LL-ICM, which optimizes the compression and LL processing performance simultaneously. Moreover, LL-ICM leverages large vision-language model (VLM) to solve different LL task within a single model, which is particularly useful when the distortion type of the original image is uncertain. As illustrated in Fig. 1(a), LL-ICM consists of a neural image codec and a VLM-based LL processing module. Given an original image with distortions, LL-ICM firstly compress it as X. Then, we extract a generalized feature F from X , which is then encoded as two representations, distortion type f and caption s. After that, the LL processing module receives X and its representations to generate the restored version of X, i.e., XH.
UR - https://www.scopus.com/pages/publications/105006822956
U2 - 10.1109/DCC62719.2025.00095
DO - 10.1109/DCC62719.2025.00095
M3 - 会议稿件
AN - SCOPUS:105006822956
T3 - Data Compression Conference Proceedings
SP - 408
BT - Proceedings - DCC 2025
A2 - Bilgin, Ali
A2 - Fowler, James E.
A2 - Serra-Sagrista, Joan
A2 - Ye, Yan
A2 - Storer, James A.
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 March 2025 through 21 March 2025
ER -