A Generative Adversarial Network (GAN) typically utilizes architectures such as convolutional neural networks (CNN). GAN framework is composed of two neural networks: Generator and Discriminator. These networks play an important role where the generator focuses on creating new data and the discriminator evaluates it. Read this chapter to learn the architecture of GANs, their components, types and mechanism that make them so powerful.
The Role of Generator in GAN Architecture
The first primary part of GAN architecture is the Generator. Let’s see its function and structure −
Generator: Function and Structure
The primary goal of the generator is to generate new data samples that are intended to resemble real data from the dataset. It begins with a random noise vector and transforms it through fully connected layers like Dense or Convolutional layers to generate synthetic data sample.
Generator: Layers and Components
Listed below are the layers and components of the generator neural network −
- Input Layer − The generator receives a low dimensionality random noise vector or input data as input.
- Fully Connected Layers − The FLC is used to increase the input noise vector dimensionality.
- Transposed Convolutional Layers − These layers are also known as deconvolutional layers. It is used for upsampling i.e., to generate an output feature map having greater spatial dimension than the input feature map.
- Activation Functions − Two commonly used activations functions are: Leaky ReLU and Tanh. The Leaky ReLU activation function helps in decreasing the dying ReLU problem, while the Tanh activation function makes sure that the output is within a specific range.
- Output Layer − It produces the final data output like an image of a certain resolution.
Generator: Objective Function
The goal of generator neural network is to create data that the discriminator cannot distinguish from real data. This is achieved by minimizing the generator’s loss function −
$$\mathrm{L_{G} \: = \: \log(1 \: – \: D(G(Z)))}$$
Here, G(z) is the generated data and D(⋅) represents the discriminator’s output.
The Role of Discriminator in GAN Architecture
The second part of GAN architecture is the Discriminator. Let’s see its function and structure −
Discriminator: Function and Structure
The primary goal of the discriminator is to classify the input data as real or generated by the generator. It takes a data sample as input and gives a probability as output that indicates whether the sample is real or fake.
Discriminator: Layers and Components
Listed below are the layers and components of the discriminator neural network −
- Input Layer − The discriminator receives a data sample from either the real dataset or the generator as input.
- Convolutional Layers − It is used for downsampling the input data to extract relevant features.
- Fully Connected Layers − The FLC is used to process the extracted features and make a final classification.
- Activation Functions − It uses Leaky ReLU activation function to address the vanishing gradient problem. It also introduces non-linearity.
- Output Layer − As name implies, it gives a single probability value between 0 and 1 as output that indicates whether the sample is real or fake.
Discriminator: Objective Function
The goal of discriminator neural network is to maximize its ability to correctly distinguish real data from generated data. This is achieved by minimizing the discriminator’s loss function −
$$\mathrm{L_{D} \: = \: -(\log D(X) \: + \: \log(1 \: – \: D(G(Z))))}$$
Here, “x” is a real data sample.
Types of Generative Adversarial Networks
We can have different types of GAN models based on the way the generator and the discriminator networks interact with each other. Here are some notable variations −
Vanilla GAN
Vanilla GAN represents the simplest form of generative adversarial networks (GANs). It provides a fundamental understanding of how GANs work. The term “Vanilla” implies that this is the simplest form without any advanced modifications or enhancements.
Deep Convolutional GANs (DCGANs)
DCGANs is one of the most popular implementations of GANs. It is composed of ConvNets in the place of multi-layer perceptron to stabilize GAN training. These guidelines have significantly stabilized GAN training particularly for image generation tasks.
Some of the key features of DCGANs include the use of:
- Strided Convolutions
- Batch Normalization
- The removal of fully connected hidden layers
Conditional GANs (cGANs)
Conditional GAN (cGAN) includes additional condition information like class labels, attributes, or even other data samples, into both generator and discriminator. With the help of these conditioning information, Conditional GANs provide us the control over the characteristic of the generated output.
CycleGANs
CycleGANs are designed for unpaired image-to-image translation tasks where there is no relation between the input and output images. A cycle consistency loss is used to ensure that translating from one domain to another and back again produces consistent results.
Progressive GANs (ProGANs)
ProGANs generate high-resolution images by progressively increasing the resolution of both the generator and discriminator during training. With this approach, you can create more detailed and higher-quality images.
StyleGANs
StyleGANs, developed by NVIDIA, is specifically designed for generating photo-realistic high-quality images. They introduced some innovative techniques for improved image synthesis and have some better control over specific attributes.
Laplacian Pyramid GAN (LAPGAN)
Laplacian Pyramid GAN (LAPGAN) is a type of generative adversarial network that uses a multi-resolution approach to generate high-quality images. It uses a Laplacian pyramid framework where images are generated at multiple scales.
LAPGANs are mainly effective in creating detailed and realistic images as compared to standard GANs.
Conclusion
GANs enable us to create realistic data across various domains. In this chapter, we explained the architecture and mechanisms of GANs.