
Classification of Generative AI Models
Diffusion Models
Diffusion models are a type of generative model that operates by progressively introducing noise into data until it conforms to a desired distribution. The main idea behind diffusion models is to learn the process of reversing this diffusion, enabling the generation of valid samples. In Figure 8 the forward pass of a diffusion model, Gaussian noise is iteratively added to the data in a series of steps. This noise corrupts the original data, gradually degrading its quality. As the noise level increases with each step, the images become increasingly distorted or destroyed. The objective of the diffusion model is to learn the dynamics of this diffusion process. By observing the corrupted data and the corresponding noise levels, the model learns to estimate the conditional probability distribution that describes the relationship between the corrupted data and the noise levels. Once the diffusion process is learned, the model can then perform the reverse pass, starting from the corrupted data and progressively removing the noise in each step. This process of denoising leads to the generation of valid and realistic samples that resemble the original data distribution.
Figure 8. Typical structure of diffusion model.
There are three sub-types that differ in their implementation of the forward and backward diffusion pass. These sub-types are denoising diffusion probabilistic models (DDPMs), score-based generative models (SGMs), and stochastic differential equations (SDEs).
Denoising Diffusion Probabilistic Models (DDPMs): DDPMs, also known as denoising score-matching models, incorporate a two-step process for diffusion. They apply Markov chains to progressively corrupt data with Gaussian noise and then reverse the forward diffusion process by learning Markov transition kernels. DDPMs focus on modeling the diffusion process and its associated reversibility.
Score-based Generative Models (SGMs): SGMs, also referred to as score-matching models, work directly with the gradient of the log density (score function) of the data. They perturb the data with noise at multiple scales and jointly estimate the score function of all noisy data distributions using a neural network conditioned on different noise levels. This decoupling of training and inference steps enables flexible sampling.
Stochastic Differential Equations (SDEs): SDEs generalize diffusion models into continuous settings. They formulate noise perturbations and denoising processes as solutions to stochastic differential equations. By leveraging the probabilistic flow of these equations, the reverse generation process can be modeled. Probability flow ordinary differential equations (ODEs) can also be utilized to represent the reverse process.
Diffusion models employ neural network architectures to capture the complex dependencies and patterns in the data. These architectures can consist of various layers, such as convolutional layers for image data or recurrent layers for sequential data. The network is trained to learn the conditional probability distribution that describes the relationship between the corrupted data and the noise levels. The training objective of diffusion models is typically based on maximum-likelihood estimation or other probabilistic frameworks. The model parameters are optimized to minimize the discrepancy between the generated samples and the original data distribution. Various techniques such as gradient descent and backpropagation are employed to train the model effectively.
Diffusion models, such as the deep diffusion generative models (DDGM), have gained prominence as strong generative models in recent years. They take a novel technique to modeling complicated data distributions by diffusing a given input iteratively towards a target distribution. However, to address specific difficulties or increase performance in specific scenarios, variations in diffusion models are necessary. The latent diffusion model (LDM) is a variant of the diffusion model that operates in latent space. It is a generative model that aims to learn the underlying data distribution by applying a diffusion process to the latent variables instead of the observed data. The latent diffusion model can develop more meaningful representations and capture the underlying structure of the data distribution by acting in latent space. It enables the efficient and effective generation of high-quality samples with desired attributes. The latent diffusion model has been used to produce varied and realistic samples in a variety of fields, including image generation, text generation, video generation, and audio synthesis.
The geometry complete diffusion model (GCDM) is an extension of the diffusion model that incorporates geometric constraints and priors into the diffusion process. It leverages the underlying geometric structure of the data to guide the diffusion process, resulting in improved generation quality and better preservation of geometric properties. The GCDM takes into account geometric relationships such as distances, angles, and shape characteristics, allowing for more precise and controlled generation of samples.
The video diffusion model (VDM) is a specific type of diffusion model designed for generating videos. It extends the diffusion process to the temporal dimension, allowing for the generation of coherent and dynamic sequences of frames. The VDM progressively corrupts the video frames with noise perturbations and then learns to denoise and generate realistic video sequences. It captures the temporal dependencies and dynamics of the data distribution, enabling the generation of videos with smooth transitions and realistic motion.