
Classification of Generative AI Models
Generative AI Model Architecture: This is the model's basic structure or design. It includes how its layers or neural networks and components are arranged and organized. The model's architecture determines how it processes and generates information, which makes it a critical aspect of its functionality and suitable for specific tasks. Table 4 describes the architecture components and training methods that are used in the generative AI models.
Table 4. Architecture components and training methods used in generative AI models.
Model | Architecture Components | Training Method |
---|---|---|
Variational Autoencoders | Encoder–Decoder | Variational Inference |
Generative Adversarial Networks | Generator–Discriminator | Adversarial |
Diffusion Models | Noising (Forward)–Denoising | Iterative Refinement |
Transformers | Encoder–Decoder | Supervised |
Language Models | Recurrent Neural Networks | Supervised |
Normalizing Flow Models | Coupling Layers | Maximum-Likelihood Estimation |
Hybrid Models | Combination of Different Models | Varied |
The classification of generative models based on architecture provides insights into the specific components and training methods that define each model as shown in Figure 3. These architectural choices have significant implications for how the models generate new data points and learn from the available data. By understanding these distinctions, researchers and practitioners can choose the most suitable generative model for their specific task or explore hybrid approaches that combine different models to leverage their respective strengths. Variational autoencoders (VAEs) have an encoder–decoder architecture and use variational inference for training. They learn compressed representations of input data and generate new samples by sampling from the learned latent space. Generative adversarial networks (GANs) consist of a generator and a discriminator. They are trained adversarially, with the generator generating synthetic samples to fool the discriminator. GANs excel at generating realistic and diverse data. Diffusion models involve a noising step followed by a denoising step. They iteratively refine noisy inputs to generate high-quality samples. Training involves learning the dynamics of the diffusion process. Transformers employ an encoder–decoder architecture and utilize self-attention mechanisms for capturing global dependencies. They are commonly used in tasks like machine translation and generate coherent sequences through supervised training. Language models, often based on recurrent neural networks (RNNs), generate sequences by predicting the next token. They are trained through supervised learning and excel at generating natural language sequences. Normalizing flow models use coupling layers to transform data while preserving density. They learn complex distributions by transforming a simple base distribution, trained via maximum-likelihood estimation. Hybrid models combine different architectures and training methods to leverage their respective strengths. They offer flexibility and tailored generative capabilities by integrating elements from multiple models.
Figure 3. Classification of the generative AI models based on the architecture.
Source: Ajay Bandi, Pydi Venkata Satya Ramesh Adapa, and Yudu Eswar Vinay Pratap Kumar Kuchi, https://www.mdpi.com/1999-5903/15/8/260
This work is licensed under a Creative Commons Attribution 4.0 License.