Drag to compare Ground Truth (left) and the selected compressed result (right).
We identity a key advantage of functional representation: it supports inference-time scaling for better performance naturally. In our framework, we can generate multiple samples per denoising steps, and select the most promising one for better compression quality.
Video editing results by changing the dog in the prompt to panda
@misc{he2026compression,
title={Compression as Adaptation: Implicit Visual Representation with Diffusion Foundation Models},
author={Jiajun He and Zongyu Guo and Zhaoyang Jia and Xiaoyi Zhang and Jiahao Li and Xiao Li and Bin Li and José Miguel Hernández-Lobato and Yan Lu},
year={2026},
eprint={2603.07615},
}