Sitemap

Building Denoising Autoencoders: What I learnt

7 min readSep 14, 2024

Imagine attempting to follow a boisterous pub conversation. Yes, it’s difficult. Imagine a machine now processing complex datasets in the same way. Denoising Autoencoders (DAEs) can help with it.

These neural networks remove extraneous noise and retrieve the clean signal, much like noise-cancelling headphones do in machine learning. We’ll discuss the operation of DAEs, their practical uses, and the insights I’ve gained from my experience in this blog article. In this article I will share my personal experience with you to make your understandings stronger about autoencoders.

Denoising Autoencoders

An unique kind of autoencoder called a denoising autoencoder is made specifically to eliminate noise from data. A denoising autoencoder is taught to reconstruct clean data from noisy input, whereas a regular autoencoder just attempts to recover the input. This is accomplished by purposefully introducing noise into the training set of input data, after which the model is trained to retrieve the original data.

Press enter or click to view image in full size
Figure 1: Structure of Autoencoders

Process

There are a few steps that autoencoders follow to denoise noisy data:

Add Noise: The initial stage involves adding noise (such as salt-and-pepper noise, Gaussian noise, etc.) to the input data.

Encode: After that, this noisy input is compressed by the encoder into a smaller latent space.

Decode: By learning to ignore the noise, the decoder attempts to recreate the original, clean data from the latent representation.

Building a Convolutional Autoencoder

I’ll guide you through the steps involved in creating a Denoisng Autoencoder (DAE) for picture data in this part. Unsupervised neural networks known as autoencoders are trained to encode input (such as photos) into a representation with fewer dimensions and then decode the data back to its original form. We will concentrate on a convolutional variation in this instance since it can capture spatial hierarchies and is hence very effective for image data. I have used images of size 240px each side as the training and testing data. The following code is the basic structure of the autoencoder:

import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from tensorflow.keras.models import Model

input_img = Input(shape=(240, 240, 3))

x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

x = Conv2D(32, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

autoencoder.fit(x_train_noisy, x_train,
epochs=50,
batch_size=128,
shuffle=True,
validation_data=(x_test_noisy, x_test))

Let’s break down more to understand autoencoders properly.

Input Layer:

This denotes the input layer, where 240 x 240 pixel images with three channels (for RGB colour images) are used. These photos will be processed by the autoencoder via its layers.

Press enter or click to view image in full size
Figure 2: Input Layer of Autoencoders

Here’s how we define the input layer for autoencoder

input_img = Input(shape=(240, 240, 3))

Encoder:

The encoder applies a number of convolutional and pooling layers to compress the input image into a lower-dimensional latent space.

Press enter or click to view image in full size
Figure 3: Encoder Layer of Autoencoders

Here’s how we make the encoder of Autoencoders

x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

Bottleneck

The middle layer of an autoencoder, which compresses the input data into a lower-dimensional latent space, is its bottleneck. By lowering the dimensionality and compelling the network to keep only the most important features while deleting less important data, it performs a critical function. As shown in the output of the second max pooling layer in the provided code, a convolutional autoencoder’s bottleneck is often characterised by reduced spatial dimensions and a compressed feature map. The model’s capacity to learn and rebuild data, striking a balance between compression and detail retention, is impacted by the size of the bottleneck.

Press enter or click to view image in full size
Figure 4: Bottleneck of autoencoders

Decoder:

Reconstructing the original image from the compressed latent space is the task assigned to the decoder. The original dimensions of the input image are recovered through the use of convolution and upsampling layers.

Press enter or click to view image in full size
Figure 5: Decoder in Autoencoders

Here’s how we build the decoder of an Autoencoder

x = Conv2D(32, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)

Output Layer

The task of recreating the input data from the compressed latent representation falls to the autoencoder’s output layer. This layer of a convolutional autoencoder upsamples the compressed data back to the original dimensions of the input image using convolutional techniques. For instance, the output layer of the given code uses a final convolutional layer with a sigmoid activation function to generate a picture with pixel values ranging from 0 to 1. By trying to reconstruct the data as closely as possible to the original, the model is able to learn how to precisely restore the compressed data.

Press enter or click to view image in full size
Figure 6: Output Layer of Autoencoders

here’s how we build the output layer

decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)

Steps to build a Denoising Autoencoder

  1. Add noise to original Data: Adding random perturbations, like Gaussian noise, to the clean input images is the process of adding noise to the original data. By feeding noisy images into the model and using this technique, the network is trained to reconstruct the original, noise-free images, denoising autoencoders. Through the simulation of real-world scenarios where data corruption may occur, this method enhances the model’s denoise and imperfection handling capabilities.
noise_factor = 0.8
x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape)
x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape)
x_train_noisy = np.clip(x_train_noisy, 0., 1.)
x_test_noisy = np.clip(x_test_noisy, 0., 1.)

2. Build the Autoencoder: The very next step is to build the autoencoder. Like I described previously, use Convolutional layers, UpSampling layers and Maxpooling layers to build the autoencoder.

import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from tensorflow.keras.models import Model

input_img = Input(shape=(240, 240, 3))

x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

x = Conv2D(32, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

autoencoder.fit(x_train_noisy, x_train,
epochs=50,
batch_size=128,
shuffle=True,
validation_data=(x_test_noisy, x_test))

3. Training the model: The following step contains compiling the model and training it on the noisy data so that it can discriminate between noise and original data.

autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

history = autoencoder.fit(x_train_noisy, x_train,
epochs=50,
batch_size=128,
shuffle=True,
validation_data=(x_test_noisy, x_test))

4. Testing the model: The last step contains testing the model. We use traditional ‘model.predict()’ on testing data to test the model and get the accuracy

decoded_imgs = autoencoder.predict(x_test_noisy)

It should return a numpy array of (753, 240, 240, 3) shape.

We can monitor the training and validation loss graph to understand how the training process was and how the data loss has been reduced gradually in the model training process.

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 5))
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
Press enter or click to view image in full size
Figure 7: Loss Graph

Based on this graph, we can come to a few decisions:

  1. Early convergence: The model learns quickly and converges early because both training and validation losses drop off quickly in the first few epochs before stabilising.
  2. No overfitting: Throughout training, the validation loss closely resembles the training loss, indicating that the model is not overfitting to the training set.
  3. Plateau reached: Both losses level out and stay steady after roughly 10 epochs, suggesting that additional training might not provide appreciable gains.
  4. Well-suited generalisation: The model appears to generalise effectively to new data, as evidenced by the final validation loss being extremely near to the training loss.
  5. Early stopping opportunity: Early halting around epoch 10–15 could conserve computational resources without compromising model performance, given the early convergence and plateau.

What Next?

Now that the autoencoder model has been successfully trained, you can assess how well it has performed using a variety of measures, including SSIM (Structural Similarity Index) and PSNR (Peak Signal-to-Noise Ratio), which gauge the denoising quality. You might also experiment with different kinds of noise, refine the model, or use the autoencoder for other tasks like feature extraction or anomaly detection. Additionally, think about incorporating the autoencoder into bigger pipelines or projects where clean data is essential for applications that come after.

In the end, Convolutional Denoising autoencoder construction and training have been incredibly instructive. I now know how to create a network architecture that efficiently splits and reassembles picture data, paying close attention to important parts like the encoder, bottleneck, and decoder. Important insights into data pretreatment and model training were gained by implementing noise addition and watching how the autoencoder learns to denoise data. My grasp of neural network architecture and autoencoder functioning has improved as a result of this project, which also made it clear how crucial it is to experiment with various noise levels and model parameters in order to get the best results. All in all, this practical experience has improved my ability to develop strong machine learning models and equipped me to take on increasingly difficult data processing tasks.

--

--

Akash Nath
Akash Nath

Written by Akash Nath

AI/ML geek building cool stuff | 4x Hackathon Winner | Building SaaS products

No responses yet