Consistency Flow Matching: Single-Step Generation With General Corruption Processes
Jerry Liu, Stanford University
Diffusion models have achieved significant advancements in generating high-quality images, audio and video across many domains. However, traditional methods rely on sampling processes requiring many sequential function evaluations, thereby limiting their real-time generation capabilities. Efforts have been made to leverage more sophisticated ODE solvers to reduce the number of function evaluations needed per generative trajectory, but these methods are susceptible to image quality degradation as the number of diffusion steps decreases. An interesting class of recent methods attempts to train a single-step generative model, either by distilling a pretrained multi-step diffusion model or training with a more complex procedure from scratch. However, these techniques are computationally expensive, and the design space of distillation-like models remains underexplored and poorly understood.
In particular, existing work focuses on diffusion-like processes involving additive Gaussian noise, while it has been shown that more general linear corruption processes may also be employed to develop generative models. In this work, we investigate the selection of the degradation process and evaluate its effect on trajectory curvatures throughout distillation. Our experiments show that we are able to train a single-step generative model using more general corruption processes, including Gaussian blur and masking. Furthermore, we demonstrate the importance of choosing appropriate data-to-noise couplings to ensure efficient distillation.
Abstract Author(s): Jerry Liu, Chris RĂ©