[Video Compression] Plug-and-Play Versatile Compressed Video Enhancement

2025. 9. 12. 09:50

https://arxiv.org/abs/2504.15380

 

Plug-and-Play Versatile Compressed Video Enhancement

As a widely adopted technique in data transmission, video compression effectively reduces the size of files, making it possible for real-time cloud computing. However, it comes at the cost of visual quality, posing challenges to the robustness of downstrea

arxiv.org

 

 

1. Introduction

Video : The most popular multimedia formats

bandwidth constraints during transmission, (compressed with varying levels)
=>
resulting in poor visual quality and suboptimal performance in downstream task

 

Previous methods

1.separate enhancement models for each compression level
2.generalization ability across diverse compression levels (during training, randomly select inputs of different compression)

=> limited improvement... (neglects downstream tasks in real-world scenario.)

 

A favorable solution

1.adaptively enhance videos of varying compressed levels with a single model
2.assist various downstream tasks on compressed videos in a plug-and play manner
3.given practical scenarios where real-time processing is required

=> codec-aware enhancement framework  (Fig.1)

 

Key technique

  • compression-aware adaptation (CAA) network
  • Bitstream-aware enhancement (BAE) network
 

Contributions

  • present a codec-aware framework for versatile compressed video enhancement
  • develop a compression-aware adaptation network and a bitstream-aware enhancement
  • experimental results show the superiority of this method over existing enhancement methods

 

 

2. Related Work

2.1. Compressed Video Enhancement

1) in-loop methods : embed filter in the encoding and decoding loops
=> not suitable for enhancing already compressed videos

2) post-processing methods : place filter at the decoder side

  • MFQE / MFQE 2.0 : Use PQF detectors (SVM/BiLSTM)
  • STDF : Handles inaccurate optical flow via spatio-temporal deformable convolution
  • S2SVR : Models long-range dependencies using sequence-to-sequence learning.

However, most of these methods:

  • Require separate models for each compression level, limiting adaptability.
  • Focus only on I/P-frames.

Our approach introduces a hierarchical adaptation mechanism to handle all frame types and varied compression levels within a unified model.

 

2.2. Codec-Aware Video Super-Resolution

Recent VSR methods incorporate codec information (e.g., motion vectors, spatial priors) to enhance reconstruction:

  • COMISR : Reduces warping errors from intra-frame randomness.
  • Chen et al. : Use motion vectors to enhance temporal consistency and suppress artifacts.
  • CVCP : Employs soft alignment and spatial feature transformation guided by codec data.
  • CIAF : Leverages motion vectors and residuals to model temporal relations and reduce computation.

However, these methods are task-specific (only for VSR). They do not generalize to broader downstream tasks.

Our framework not only achieves competitive results in VSR, but also effectively supports diverse vision tasks such as optical flow estimation and object segmentation, demonstrating broader applicability.

 

2.3. Dynamic Neural Networks

Dynamic neural networks adapt parameters or structure per input to avoid separate models:

  • MoE: Parallel branches selectively activated for weighted outputs.
  • Dynamic Parameter Ensemble: Fuse preset expert layers’ parameters for better generalization.
  • Gain-tune: Predicts channel-wise scaling to adapt static models.
  • Dynamic Transfer: Combines residual and static convolutions to handle multiple domains.
  • DRConv: Learns masks to apply region-specific filters efficiently.
 

Our method uses bitstream priors as conditions for dynamic adaptation, avoiding parameter search or learning.

 

 

3. Preliminaries

3.1. Hierarchical Quality Adjustment

Constant Rate Factor (CRF)
CRF ranges from 0 to 51 to balance compression efficiency and visual quality

Sequence-wise CRF (CRF_s)
By considering the sequence-wise CRF, the enhancement network can be tailored to handle videos of different compressison levels

Frame-wise CRF (CRF_i)
CRF value of each frame adjusted based on CRF_s so that lower CRF_i is assigned for I/P frames to maintain quality and higher CRF_i for B frames for compact representations

 

 

3.2. Redundancy Reduction

Partition Map
each frame are partitioned into blocks of varying sizes
=> We propose dynamically assigning filters based on the partition map that indicates region complexity

Motion Vector
motion vectors describe the relationship between current frame and its reference frames in a block-wise manner.
=> Motion vectors can effectively align reference frames with current frame although they are less precise than optical flow

partition map
motion vector

img source : : https://deeprender.ai/blog/motion-compensation-and-prediction-h265

 

 

 

4. Codec-Aware Enhancement Framework

4.1. Overview

 

4.2. Compression-Aware Adaptation Network

CAA Networks G_ɸ utilizes CRF_s to estimate sequence-adaptive parameters and perform frame adaptation based on CRF_i

 

Sequence adaptation
we propose estimating sequence-adaptive parameters for the enhancement network

 

CRF_s is predicted only once and reused by subsequent frames

 

Frame adaptation
we propose to re-weight the sequence-based f_(θ_s) using frame-wise  CRF_i

 

f_(θ_s) is used to construct the enhancement blocks, resulting in the frame-adaptive BAE network F _(θ_i)

 

4.3. Bitstream-Aware Enhancement Network

BAE Networks (F _(θ_i )) utilizes motion vectors to align reference frames

 

Motion Vector alignment
The warped reference feature are concatenated (channel dimension) with current frame along the channel dimension as input of the BAE network

 

Region-aware alignment
We propose to dynamically assign different filters for regions based on the partition map.

 

4.4. Loss Function

We adopt Charbonnier penalty loss as the loss function and train the proposed codec-aware enhancement framework

 

 

 

5. Experiments

5.1. Experimental Settings

5.2. Results

Evaluations are two-fold:
1)verifying the quality enhancement performance on seen, unseen, and highly compressed scenario.
2)evaluating the versatility to assist different downstream tasks on multiple compression settings

 

Quantitative results
Metrics : Param/M, FLOPs/G, Speed/ms, FPS, PSNR, SSIM

 

Qualitative results

 

5.2.2. Versatility Evaluation

Video Super-resolution
We adopt BasicVSR, IconVSR, and BasicVSR++ on the REDS4 dataset which are trained on  ‘clean’ data without considering compression

Metrics : PSNR, SSIM

Result
inputs with Metabit fails to improve the performance of downstream VSR models.
our framework yields consistent improvement

 

 

Optical Flow Estimation
We adopt RAFT, DEQ, KPAFlow and evaluate on the KITTI-2025 dataset

Metrics : EPE (end-point-error), F1-all loss

Results
reduces the EPE and F1-all loss across all baseline models


 

 

Video Object Segmentation
We adopt CTCN, DeAoT, QDMN and evaluate on DAVIS-17 val dataset

Metrics : Average, F score(boundary similarity), J (average IoU)

Results
the proposed method shows the best performance in improving accuracy across VOS models

 

Video Inpainting
We adopt E^2 FGVI on DAVIS-17 val dataset

Results
Pre-enhancing the compressed inputs with the proposed method notably refines the artifacts and distortions, yielding more visually pleasing results

 

5.3. Ablation Studies

MV alignment
Region-aware refinement.
Sequence adaptation.
Frame adaptation.

=> Each method is more effective than the baseline

 

6. conclusion

Methods : versatile codec-aware enhancement framework
Details : adaptively handles diverse compression settings and serves as a plug-and-play enhancement module to consistently boost various downstream tasks
Results : It shows superiority in both enhancement performance and robustness, making it possible to deploy pre-trained models on compressed videos without a significant performance drop

 

 

BELATED ARTICLES

more