Compression as Adaptation: Implicit Visual Representation with Diffusion Foundation Models

1University of Cambridge, 2Microsoft Research Asia *Equal contribution

Compression Performance Comparison

Drag to compare Ground Truth (left) and the selected compressed result (right).

Method overview

Comparison between different representation methods.
Comparison between different representation methods: (a) Explicit representations by encoding signals into symbolic latent variables. (b) Implicit representations that encode signal information implicitly in functions. (c) Adaptation of generative models can serve as implicit visual representations
A detailed illustration of the adaptation method in a pretrained diffusion foundation model.
A detailed illustration for our methods by adaptation in a pretrained diffusion foundation model

Representation results

Reconstruction quality versus training step results.
Reconstruction quality v.s. training step. (a) Common LoRA representations with different ranks for image Kodim03 from Kodak dataset. (b) One-vector representations for image Kodim03, varying LoRA rank and vector size after hashing. (c) One-vector representations for video Beauty from UVG dataset.

Compression Performances

UVG compression performance results.
Comparisons video codecs on UVG. For DISTS, FVD and LPIPS, lower is better. For PSNR, higher is better.

Compression inference-time scaling

We identity a key advantage of functional representation: it supports inference-time scaling for better performance naturally. In our framework, we can generate multiple samples per denoising steps, and select the most promising one for better compression quality.

Compression inference-time scaling results.
Reconstruction quality by inference-time scaling with different number of steps and different sample sizes per step.

Beyond Compression: Image and Video Editing

Editing results using LoRA-based representations.
We can use the LoRA-based representations for controlled generation, such as image editing or merging

Video editing results by changing the dog in the prompt to panda

More RD Curves

HEVC B compression performance results.
Comparisons video codecs on HEVC B. For DISTS, FVD and LPIPS, lower is better. For PSNR, higher is better.
HEVC C compression performance results.
Comparisons video codecs on HEVC C. For DISTS, FVD and LPIPS, lower is better. For PSNR, higher is better.
HEVC E compression performance results.
Comparisons video codecs on HEVC E. For DISTS, FVD and LPIPS, lower is better. For PSNR, higher is better.

BibTex

@misc{he2026compression,
  title={Compression as Adaptation: Implicit Visual Representation with Diffusion Foundation Models},
  author={Jiajun He and Zongyu Guo and Zhaoyang Jia and Xiaoyi Zhang and Jiahao Li and Xiao Li and Bin Li and José Miguel Hernández-Lobato and Yan Lu},
  year={2026},
  eprint={2603.07615},
}