TY - JOUR
T1 - Enhanced Context Mining and Filtering for Learned Video Compression
AU - Guo, Haifeng
AU - Kwong, Sam
AU - Ye, Dongjie
AU - Wang, Shiqi
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - The Deep Contextual Video Compression framework (DCVC) utilizes a conditional coding paradigm, where the context is extracted and employed as a condition for the contextual encoder-decoder and entropy model. In this paper, we propose enhanced context mining and filtering to improve the compression efficiency of DCVC. Firstly, considering the context of DCVC is generated without supervision and redundancy may exist among context channels, an enhanced context mining model is proposed to mitigate redundancy across context channels to obtain superior context features. Then, we introduce a transformer-based enhancement network as a filtering module to capture long-distance dependencies and further enhance compression efficiency. The transformer-based enhancement adopts a full-resolution pipeline and calculates self-attention across channel dimensions. By combining the local modeling ability of the enhanced context mining model and the non-local modeling ability of the transformer-based enhancement network, our model outperforms LDP configurations of Versatile Video Coding (VVC), achieving an average bit savings of 6.7% in terms of MS-SSIM.
AB - The Deep Contextual Video Compression framework (DCVC) utilizes a conditional coding paradigm, where the context is extracted and employed as a condition for the contextual encoder-decoder and entropy model. In this paper, we propose enhanced context mining and filtering to improve the compression efficiency of DCVC. Firstly, considering the context of DCVC is generated without supervision and redundancy may exist among context channels, an enhanced context mining model is proposed to mitigate redundancy across context channels to obtain superior context features. Then, we introduce a transformer-based enhancement network as a filtering module to capture long-distance dependencies and further enhance compression efficiency. The transformer-based enhancement adopts a full-resolution pipeline and calculates self-attention across channel dimensions. By combining the local modeling ability of the enhanced context mining model and the non-local modeling ability of the transformer-based enhancement network, our model outperforms LDP configurations of Versatile Video Coding (VVC), achieving an average bit savings of 6.7% in terms of MS-SSIM.
KW - end-to-end training approach
KW - enhanced context mining
KW - in loop filtering
KW - Learned video compression
UR - https://www.scopus.com/pages/publications/85181570202
U2 - 10.1109/TMM.2023.3316429
DO - 10.1109/TMM.2023.3316429
M3 - 文章
AN - SCOPUS:85181570202
SN - 1520-9210
VL - 26
SP - 3814
EP - 3826
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -