[Video Compression] Plug-and-Play Versatile Compressed Video Enhancement

[Video Compression] Plug-and-Play Versatile Compressed Video Enhancement
/category/Computer%20Science/AI

2025. 9. 12. 09:50

Plug-and-Play Versatile Compressed Video Enhancement

As a widely adopted technique in data transmission, video compression effectively reduces the size of files, making it possible for real-time cloud computing. However, it comes at the cost of visual quality, posing challenges to the robustness of downstrea

arxiv.org

1. Introduction

Video : The most popular multimedia formats

bandwidth constraints during transmission, (compressed with varying levels)
=> resulting in poor visual quality and suboptimal performance in downstream task

Previous methods

1.separate enhancement models for each compression level

2.generalization ability across diverse compression levels (during training, randomly select inputs of different compression)

=> limited improvement... (neglects downstream tasks in real-world scenario.)

A favorable solution

1.adaptively enhance videos of varying compressed levels with a single model

2.assist various downstream tasks on compressed videos in a plug-and play manner

3.given practical scenarios where real-time processing is required

=> codec-aware enhancement framework (Fig.1)

Key technique

compression-aware adaptation (CAA) network
Bitstream-aware enhancement (BAE) network

Contributions

present a codec-aware framework for versatile compressed video enhancement
develop a compression-aware adaptation network and a bitstream-aware enhancement
experimental results show the superiority of this method over existing enhancement methods

2. Related Work

2.1. Compressed Video Enhancement

1) in-loop methods : embed filter in the encoding and decoding loops
=> not suitable for enhancing already compressed videos

2) post-processing methods : place filter at the decoder side

MFQE / MFQE 2.0 : Use PQF detectors (SVM/BiLSTM)
STDF : Handles inaccurate optical flow via spatio-temporal deformable convolution
S2SVR : Models long-range dependencies using sequence-to-sequence learning.

However, most of these methods:

Require separate models for each compression level, limiting adaptability.
Focus only on I/P-frames.

Our approach introduces a hierarchical adaptation mechanism to handle all frame types and varied compression levels within a unified model.

2.2. Codec-Aware Video Super-Resolution

Recent VSR methods incorporate codec information (e.g., motion vectors, spatial priors) to enhance reconstruction:

COMISR : Reduces warping errors from intra-frame randomness.
Chen et al. : Use motion vectors to enhance temporal consistency and suppress artifacts.
CVCP : Employs soft alignment and spatial feature transformation guided by codec data.
CIAF : Leverages motion vectors and residuals to model temporal relations and reduce computation.

However, these methods are task-specific (only for VSR). They do not generalize to broader downstream tasks.

Our framework not only achieves competitive results in VSR, but also effectively supports diverse vision tasks such as optical flow estimation and object segmentation, demonstrating broader applicability.

2.3. Dynamic Neural Networks

Dynamic neural networks adapt parameters or structure per input to avoid separate models:

MoE: Parallel branches selectively activated for weighted outputs.
Dynamic Parameter Ensemble: Fuse preset expert layers’ parameters for better generalization.
Gain-tune: Predicts channel-wise scaling to adapt static models.
Dynamic Transfer: Combines residual and static convolutions to handle multiple domains.
DRConv: Learns masks to apply region-specific filters efficiently.

Our method uses bitstream priors as conditions for dynamic adaptation, avoiding parameter search or learning.

3. Preliminaries

3.1. Hierarchical Quality Adjustment

Constant Rate Factor (CRF)
CRF ranges from 0 to 51 to balance compression efficiency and visual quality

Sequence-wise CRF (CRF_s)
By considering the sequence-wise CRF, the enhancement network can be tailored to handle videos of different compressison levels

Frame-wise CRF (CRF_i)
CRF value of each frame adjusted based on CRF_s so that lower CRF_i is assigned for I/P frames to maintain quality and higher CRF_i for B frames for compact representations

3.2. Redundancy Reduction

Partition Map
each frame are partitioned into blocks of varying sizes
=> We propose dynamically assigning filters based on the partition map that indicates region complexity

Motion Vector
motion vectors describe the relationship between current frame and its reference frames in a block-wise manner.
=> Motion vectors can effectively align reference frames with current frame although they are less precise than optical flow

img source : : https://deeprender.ai/blog/motion-compensation-and-prediction-h265

4. Codec-Aware Enhancement Framework

4.1. Overview

4.2. Compression-Aware Adaptation Network

CAA Networks G_ɸ utilizes CRF_s to estimate sequence-adaptive parameters and perform frame adaptation based on CRF_i

Sequence adaptation
we propose estimating sequence-adaptive parameters for the enhancement network

CRF_s is predicted only once and reused by subsequent frames

Frame adaptation
we propose to re-weight the sequence-based f_(θ_s) using frame-wise CRF_i

f_(θ_s) is used to construct the enhancement blocks, resulting in the frame-adaptive BAE network F _(θ_i)

4.3. Bitstream-Aware Enhancement Network

BAE Networks (F _(θ_i )) utilizes motion vectors to align reference frames

Motion Vector alignment
The warped reference feature are concatenated (channel dimension) with current frame along the channel dimension as input of the BAE network

Region-aware alignment
We propose to dynamically assign different filters for regions based on the partition map.

4.4. Loss Function

We adopt Charbonnier penalty loss as the loss function and train the proposed codec-aware enhancement framework

5. Experiments

5.1. Experimental Settings

5.2. Results

Evaluations are two-fold:
1)verifying the quality enhancement performance on seen, unseen, and highly compressed scenario.
2)evaluating the versatility to assist different downstream tasks on multiple compression settings

Quantitative results
Metrics : Param/M, FLOPs/G, Speed/ms, FPS, PSNR, SSIM

Qualitative results

5.2.2. Versatility Evaluation

Video Super-resolution
We adopt BasicVSR, IconVSR, and BasicVSR++ on the REDS4 dataset which are trained on ‘clean’ data without considering compression

Metrics : PSNR, SSIM

Result
inputs with Metabit fails to improve the performance of downstream VSR models.
our framework yields consistent improvement

Optical Flow Estimation
We adopt RAFT, DEQ, KPAFlow and evaluate on the KITTI-2025 dataset

Metrics : EPE (end-point-error), F1-all loss

Results
reduces the EPE and F1-all loss across all baseline models

Video Object Segmentation
We adopt CTCN, DeAoT, QDMN and evaluate on DAVIS-17 val dataset

Metrics : Average, F score(boundary similarity), J (average IoU)

Results
the proposed method shows the best performance in improving accuracy across VOS models

Video Inpainting
We adopt E^2 FGVI on DAVIS-17 val dataset

Results
Pre-enhancing the compressed inputs with the proposed method notably refines the artifacts and distortions, yielding more visually pleasing results

5.3. Ablation Studies

•MV alignment

•Region-aware refinement.

•Sequence adaptation.

•Frame adaptation.

=> Each method is more effective than the baseline

6. conclusion

• Methods : versatile codec-aware enhancement framework

• Details : adaptively handles diverse compression settings and serves as a plug-and-play enhancement module to consistently boost various downstream tasks

• Results : It shows superiority in both enhancement performance and robustness, making it possible to deploy pre-trained models on compressed videos without a significant performance drop

저작자표시 (새창열림)

'Computer Science > AI' 카테고리의 다른 글

3DGS를 위한 COLMAP-GUI 사용법 (1)	2025.05.09
[3DGS] Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis (0)	2025.05.01
[3DGS] LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS (0)	2025.04.21
[3DGS] Text-to-3D using Gaussian Splatting / ver. Kor (0)	2025.04.07
[3DGS] Mip-Splatting: Alias-free 3D Gaussian Splatting / Ver.Kor (0)	2025.03.31

모북의 코딩블로그

CATEGORIES

[Video Compression] Plug-and-Play Versatile Compressed Video Enhancement
/category/Computer%20Science/AI