Compression as Adaptation: Implicit Visual Representation with Diffusion Foundation Models

Jiajun He^1*, Zongyu Guo^2*, Zhaoyang Jia², Xiaoyi Zhang², Jiahao Li², Xiao Li², Bin Li² José Miguel Hernández-Lobato¹ Yan Lu²

¹University of Cambridge, ²Microsoft Research Asia ^*Equal contribution

Paper arXiv Code

Compression Performance Comparison

Drag to compare Ground Truth (left) and the selected compressed result (right).

Method overview

Comparison between different representation methods. — Comparison between different representation methods: (a) Explicit representations by encoding signals into symbolic latent variables. (b) Implicit representations that encode signal information implicitly in functions. (c) Adaptation of generative models can serve as implicit visual representations

A detailed illustration of the adaptation method in a pretrained diffusion foundation model. — A detailed illustration for our methods by adaptation in a pretrained diffusion foundation model

Representation results

Reconstruction quality versus training step results. — Reconstruction quality v.s. training step. (a) Common LoRA representations with different ranks for image Kodim03 from Kodak dataset. (b) One-vector representations for image Kodim03, varying LoRA rank and vector size after hashing. (c) One-vector representations for video Beauty from UVG dataset.

Compression Performances

UVG compression performance results. — Comparisons video codecs on UVG. For DISTS, FVD and LPIPS, lower is better. For PSNR, higher is better.

Compression inference-time scaling

We identity a key advantage of functional representation: it supports inference-time scaling for better performance naturally. In our framework, we can generate multiple samples per denoising steps, and select the most promising one for better compression quality.

Beyond Compression: Image and Video Editing

Editing results using LoRA-based representations. — We can use the LoRA-based representations for controlled generation, such as image editing or merging

Video editing results by changing the dog in the prompt to panda

More RD Curves

HEVC B compression performance results. — Comparisons video codecs on HEVC B. For DISTS, FVD and LPIPS, lower is better. For PSNR, higher is better.

HEVC C compression performance results. — Comparisons video codecs on HEVC C. For DISTS, FVD and LPIPS, lower is better. For PSNR, higher is better.

HEVC E compression performance results. — Comparisons video codecs on HEVC E. For DISTS, FVD and LPIPS, lower is better. For PSNR, higher is better.

BibTex

@misc{he2026compression,
  title={Compression as Adaptation: Implicit Visual Representation with Diffusion Foundation Models},
  author={Jiajun He and Zongyu Guo and Zhaoyang Jia and Xiaoyi Zhang and Jiahao Li and Xiao Li and Bin Li and José Miguel Hernández-Lobato and Yan Lu},
  year={2026},
  eprint={2603.07615},
}