Let’s Generate Images using GAN

8 min readOct 6, 2024

Introduction

Ever wanted to generate cool images using AI? No, I am not talking about any existing Saas product or any API, I am talking about creating the network by YOURSELF!!! Today I will walk you through the necessary steps to create a working Generative Adversarial Network, also known as GANs. So without furthur a do, let’s dive into this.

Generative Adversarial Network

A Generative Adversarial Network, a.k.a GAN has two main parts: Generator, and the Discriminator. The Generator generates image from a lower dimensional latent space representation, and then the Discriminiator checks wheather the generated image is REAL OR FAKE compared to some high dimensional image data.

Based on the feedback of the Discriminator, we jump to the backward propagation step, where we change the weight and bias of the network in such a way so that the generated image seems so real that even the Discriminator tells it’s real. In this training process both the Generator and Discriminator gets better at generating and deciding whether the image is real or not.

Building GAN

Building GAN is not so tough work. All we need as prerequisite are: 1. Basic Python, 2. Basic concepts of Neural Networks.

Necessary Imports

To build the GAN, we will be using vanilla Tensorflow and Keras. We will import them along with some other libraries before we proceed to next step

import tensorflow as tf
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt

Data Loading and Preprocessing

In this project I will be using Fasion MNIST dataset as the training and testing data. Fashion MNIST contains 70000 images (60000 training image, 10000 testing image) of different dresses and other fasion elements of size 28 x 28 px.

To load the data, we will use the built in function of keras

(train_images, _), (_, _) = tf.keras.datasets.fashion_mnist.load_data()
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32')
train_images = (train_images - 127.5) / 127.5

BUFFER_SIZE = 60000
BATCH_SIZE = 256

train_dataset = tf.data.Dataset.from_tensor_slices(train_images).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)

We will first load the image arrays only as GAN trains on unsupervised learing. After loading the data we will normalize the values of each pixel between [-1, 1]. After this step to improve the effectiveness and efficiency of training the network, we build a dataset utilising solely the training images. In order to prevent bias during training due to data order, the train_images array is randomly shuffled. The shuffled data is then divided into smaller batches, each of which has 256 photos. We can feed the data to the network in reasonable pieces thanks to this batching mechanism, which optimises memory utilisation and expedites the training process.

Generator Network

The Generator network in an Generative Adversarial Network generates the images from the low dimensional data. The generator creates an image that resembles the actual data by using a random noise vector as input. After reshaping the output and starting with a fully connected layer, we employ multiple Conv2DTranspose layers to upsample the image to the desired size.

def make_generator_model():
    model = tf.keras.Sequential([
        layers.Dense(7*7*256, use_bias=False, input_shape=(100,)),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
        
        layers.Reshape((7, 7, 256)),
        layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
        
        layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
        
        layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh')
    ])
    return model

The process begins with a dense layer that creates a 7x7x256 tensor from a 100-dimensional noise vector. This tensor serves as a preliminary design for an image with low resolution. After using batch normalisation to stabilise training, the Leaky ReLU activation function is used to introduce non-linearity and prevent neurones from becoming dormant.

After reshaping the output into a 7x7 image with 256 channels, the image size is gradually increased while the depth is decreased using transposed convolutional layers (Conv2DTranspose). While the following layer expands to 14x14 with 64 channels, the first transposed layer keeps the size at 7x7 but decreases the number of channels to 128. This technique is repeated until the last layer yields an image that is 28x28x1, the same size as the Fashion MNIST photos. The generator can produce data that appears realistic since the tanh activation function makes sure pixel values are scaled between -1 and 1, matching the normalised input images.

Discriminator Network

The Discriminator works as an examinar, that examins the generated image, compares with the original image and then concludes whether the generated image is real or not.

def make_discriminator_model():
    model = tf.keras.Sequential([
        layers.Conv2D(64, (5, 5), strides=(2, 2), padding='same', input_shape=[28, 28, 1]),
        layers.LeakyReLU(),
        layers.Dropout(0.3),
        
        layers.Conv2D(128, (5, 5), strides=(2, 2), padding='same'),
        layers.LeakyReLU(),
        layers.Dropout(0.3),
        
        layers.Flatten(),
        layers.Dense(1)
    ])
    return model

Determining if an image is real (from the training dataset) or fake (produced by the generator) is the main job of the discriminator. It begins with a 2D convolutional layer that scans over the input pictures (28x28 pixels with 1 channel for greyscale) using 64 filters of size 5x5 in order to identify features like edges and textures. By reducing the output size by half, the strides=(2, 2) argument downsamples the image. Next, a Leaky ReLU activation function is employed, which mitigates the problems associated with inactive neurones and permits non-linear transformations. To avoid overfitting, a dropout layer with a 30% rate is incorporated, which randomly sets certain neurones to zero during training.

The model then adds a second convolutional layer with 128 filters and applies Leaky ReLU and dropout in the same way. Following these convolutional layers, the output is prepared for the following layer by being flattened into a one-dimensional array. Ultimately, a single value that indicates the probability that the input image is authentic (nearer to 1) or fraudulent (nearer to 0) is produced by a dense layer. All things considered, the discriminator uses this sequence of convolutional, activation, and dropout layers to efficiently learn to discriminate between actual and produced images.

Loss Functions

Both the generator and the discriminator use binary cross-entropy as the loss function. The generator wants to “fool” the discriminator, so its loss measures how well it can make the discriminator classify fake images as real. The discriminator’s loss is calculated based on how well it can distinguish between real and fake images.

cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)

def generator_loss(fake_output):
    return cross_entropy(tf.ones_like(fake_output), fake_output)

def discriminator_loss(real_output, fake_output):
    real_loss = cross_entropy(tf.ones_like(real_output), real_output)
    fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
    total_loss = real_loss + fake_loss
    return total_loss

Optimizers

We use the Adam optimizer for both the generator and discriminator, with a learning rate of 1e-4 to ensure smooth and stable training.

generator_optimizer = tf.keras.optimizers.Adam(1e-4)
discriminator_optimizer = tf.keras.optimizers.Adam(1e-4)

Training Procedure

The generator and discriminator are switched between during the training cycle. The generator creates fictitious graphics from random noise for every batch. After evaluating both actual and phoney images, the discriminator computes the losses.

@tf.function
def train_step(images):
    noise = tf.random.normal([BATCH_SIZE, 100])

    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
        generated_images = generator(noise, training=True)

        real_output = discriminator(images, training=True)
        fake_output = discriminator(generated_images, training=True)

        gen_loss = generator_loss(fake_output)
        disc_loss = discriminator_loss(real_output, fake_output)

    gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)

    generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
    discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))

def generate_and_save_images(model, epoch, test_input):
    predictions = model(test_input, training=False)
    
    fig = plt.figure(figsize=(4, 4))
    
    for i in range(predictions.shape[0]):
        plt.subplot(4, 4, i+1)
        plt.imshow(predictions[i, :, :, 0] * 127.5 + 127.5, cmap='gray')
        plt.axis('off')
    
    plt.savefig(f'image_at_epoch_{epoch:04d}.png')
    plt.show()

def train(dataset, epochs):
    for epoch in range(epochs):
        gen_loss_epoch = []
        disc_loss_epoch = []
        
        for image_batch in dataset:
            gen_loss, disc_loss = train_step(image_batch)
            gen_loss_epoch.append(gen_loss)
            
        generator_losses.append(np.mean(gen_loss_epoch))
        discriminator_losses.append(np.mean(disc_loss_epoch))
        
        generate_and_save_images(generator, epoch + 1, seed)
        
        print(f'Epoch {epoch+1} completed: Generator Loss = {np.mean(gen_loss_epoch)}, Discriminator Loss = {np.mean(disc_loss_epoch)}')

The train_step function is in charge of carrying out one training iteration for the models of the discriminator and generator. To create artificial images, it first generates a batch of random noise vectors, which are then fed into the generator. Two gradient tapes are made inside the function to record the generator and discriminator’s activities independently. While the discriminator assesses both the created and real images (provided as input), the generator uses the noise as input to produce generated images. Using each model’s unique loss function, the losses are calculated. The optimiser uses the gradients it has computed based on the losses to update the weights of the discriminator and generator, enhancing their performance in the training process.

The generator’s images during predetermined training epochs can be seen and saved using the generate_and_save_images function. To guarantee consistent output for visualisation, three requirements must be met: the generator model, the current epoch number, and a fixed input (seed) of random noise. From the seed, the program generates images, which are then displayed on a subplot grid. For optimal visualisation, every generated image is resized to its original pixel value range of 0 to 255. The images are then stored to disc with an epoch-number-inclusive filename. This makes it easier to monitor how the generator performs over time as it develops its ability to produce more lifelike images.

The entire training procedure is coordinated over a predetermined number of epochs by the train function. It initialises lists to hold the discriminator losses and generator for each epoch. It calls the train_step function to execute training on each batch of images from the training dataset as iteratively goes through the batches and logs the losses. It computes the average losses for the epoch after processing each batch and adds them to the corresponding lists for further examination. To see the generator’s output after every epoch and see how the created photos get better over time, the function also calls generate_and_save_images. Lastly, it helps track the training progress by printing the average generator and discriminator losses at the conclusion of each epoch.

Generating Images during Training

We create and store pictures at the end of every epoch in order to monitor the GAN’s development. Our method creates consistent images for every epoch by using a fixed random noise vector. The Generated image quality improvements can be seen in the following after 100 epochs.

Figure 3: Generated Image quality comparison

Conclusion

In the end, there is a lot of room for improvements. The generator learns to produce realistic images that closely match the Fashion MNIST data by the end of the training process. The basic ideas of GANs are illustrated in this project, and with minor adjustments, you may use these methods to produce other kinds of data or even better-quality photos.

And this is how, we generate images using Generative Adversarial Networks, also known as GANs.