Sitemap

How I built an Autoencoder with ONLY numpy

19 min readJan 18, 2025

Have you ever wondered how AI can learn to compress and reconstruct data — just like how your brain processes images and sounds? Picture this: You give a machine a massive image, and it figures out how to squish it down into a compact version, keeping only the essential details. But when you ask it to restore the image, it brings it back almost exactly as it was, with remarkable accuracy. Sounds like magic, right? Well, it’s all thanks to autoencoders, a powerful concept in deep learning.

Let me take you on a journey through the fascinating world of autoencoders, diving into the project I recently completed using the MNIST dataset, where I built an autoencoder from scratch. This blog will explain everything you need to know about autoencoders, step by step, with some cool illustrations and examples along the way.

What is an Autoencoder?

At its core, an autoencoder is a type of neural network used for unsupervised learning. Imagine a data compression tool — just like how WinRAR compresses files to save space but still allows you to open them later. In the same way, an autoencoder learns to compress data (like images) into a smaller “encoded” representation and then reconstruct it back to its original form.

Press enter or click to view image in full size
Figure 1: Visual Representation of an Autoencoder

Here’s the magic:

  1. Encoder: It takes the input and compresses it into a smaller, more efficient representation (called the latent space).
  2. Decoder: It then takes that compressed version and reconstructs the original data from it.

The real beauty of autoencoders is their ability to learn this compression and decompression by themselves, without needing explicit labels for training. They discover the most important features from the data on their own.

A Deep Dive Into the MNIST Dataset

The MNIST dataset is the quintessential benchmark for beginners in machine learning and computer vision. It contains 60,000 training images of handwritten digits (0–9), each 28x28 pixels in grayscale. It’s perfect for testing out a neural network, as the data is simple but still offers enough complexity for a meaningful experiment. I used a CSV format of this dataset for my project, that contained all 784 pixel values in one CSV for 60000 image each.

What Did I Do?

In my project, I built an autoencoder that works on the MNIST dataset with these steps:

  1. Import Necessary Libraries: First, I imported Pandas to load dataset and preprocess the data, Numpy to build the whole Autoencoder and Matplotlib to visualize outputs.
  2. Preprocessing the Data: First, I loaded the MNIST dataset and preprocessed it. The images are flattened to 784 dimensions (28x28 = 784 pixels), and the pixel values are scaled between 0 and 1. This is a standard practice in deep learning to ensure that the model trains faster and more effectively.
train_data = pd.read_csv('data/mnist_train.csv')
test_data = pd.read_csv('data/mnist_test.csv')

train_data = train_data.drop('label', axis=1)
test_data = test_data.drop('label', axis=1)

train_data = train_data.values.reshape(-1, 784)
test_data = test_data.values.reshape(-1, 784)

train_data = train_data / 255.0
test_data = test_data / 255.0

3. Activation Functions: We used two very famous activation function for this project. They are: ReLU and Sigmoid

ReLU (Rectified Linear Unit): In the context of artificial neural networks, the rectifier or ReLU (rectified linear unit) activation function is an activation function defined as the non-negative part of its argument. (Source: Wikipedia)

In simple terms: it keeps only the positive values and sets negative values to zero, making the model faster and helping it learn complex patterns.

Press enter or click to view image in full size
Figure 2: ReLU (Source: Wikipedia)
Figure 3: Graphical Representation of ReLU activation function
Figure 3: Visual representation of ReLU (Source: Wikipedia)

Sigmoid: A sigmoid function is any mathematical function whose graph has a characteristic S-shaped or sigmoid curve. (Source: Wikipedia)

In simple terms: It squashes any input value into a range between 0 and 1. It works like a smooth “S” shape, where very large inputs become close to 1, very small (negative) inputs become close to 0, and values around 0 are mapped near the middle, around 0.5. This makes it useful for tasks like probability prediction or binary classification.

Press enter or click to view image in full size
Figure 4: Sigmoid Function (Source: Wikipedia)
Figure 5: Visual Representation of Sigmoid function (Source: Wikipedia)
def relu(x):
return np.maximum(0.01 * x, x) # Leaky ReLU

def sigmoid(x):
return 1 / (1 + np.exp(-x))

4. Loss Function: Binary Cross-Entropy (BCE) loss is a loss function used in binary classification tasks to measure the difference between the predicted probabilities and the actual labels (0 or 1). It calculates the negative average of the log-likelihood of the true labels given the predicted probabilities. Mathematically, for nn samples, it is defined as:

Press enter or click to view image in full size
Binary Cross Entropy Loss

Here, yi is the true label (0 or 1), and yi^ is the predicted probability. BCE penalizes wrong predictions more heavily when the predicted probability is confident but incorrect, making it ideal for binary classification problems.

def bce_loss(y_true, y_pred):
epsilon = 1e-12
y_pred = np.clip(y_pred, epsilon, 1 - epsilon)

loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
return loss

5. Calculating Accuracy:

def calculate_accuracy(y_true, y_pred, threshold=0.1):
diff = np.abs(y_true - y_pred)
correct_features = (diff < threshold).astype(int)
sample_accuracy = np.mean(correct_features, axis=1)
overall_accuracy = np.mean(sample_accuracy) * 100
return overall_accuracy

This function measures the accuracy of predictions by comparing them to the true values within a predefined threshold. The idea is to check whether the predicted values are close enough to the actual values, allowing for some margin of error (defined by the threshold).

For each prediction, the absolute difference between the predicted value and the actual value is calculated. If this difference is smaller than the threshold, the prediction is considered correct. Accuracy is then calculated at two levels:

  • Feature-level accuracy: The proportion of correctly predicted features within each sample.
  • Overall accuracy: The average accuracy across all samples, expressed as a percentage.

By setting a small threshold, this method ensures that predictions are not only correct but also precise, making it useful for tasks where slight deviations matter.

6. Initialization of Parameters: The initialization of parameters is a vital step in constructing a neural network, as it sets the starting point for learning. In the provided example, the weights (W_{1}) are initialized using a random distribution scaled by \sqrt{\frac{2}{\text{input size}}}. This scaling is based on the He initialization method, which helps maintain stable gradients during training, particularly when using ReLU activation functions. The biases (b_{1}) are initialized to zero, ensuring that the initial output of the network is unbiased. These initialized parameters serve as the foundation for the training process.

def init_params(input_size, output_size):
W1 = np.random.randn(output_size, input_size) * np.sqrt(2. / input_size)
b1 = np.zeros((output_size, 1))
return W1, b1

7. Defining Dense Layer: Defining Dense Layer

The dense layer in a neural network is a fully connected layer where each neuron is connected to every neuron in the previous layer. In the provided example, the dense layer is implemented by performing a matrix multiplication between the input (x) and the weights (w) followed by adding the bias (b). This operation calculates the weighted sum of inputs and applies the bias, which serves as the output for the layer. The result is then passed to the next layer in the network, where activation functions or other operations can be applied. This process forms the core of a neural network’s forward pass.

def dense(x, w, b):
return np.dot(x, w) + b

8. Building the Optimizer: Building the Optimizer: A Beginner-Friendly Explanation

In machine learning, training a model means adjusting its parameters (such as weights and biases) to minimize the error (loss) between the predicted and actual values. Optimizers are algorithms that help in this process by adjusting the model’s parameters to reduce the loss step by step. One of the most commonly used optimizers is Adam (Adaptive Moment Estimation). Let’s break down how it works with a beginner-friendly explanation and the math behind it.

What is Adam Optimizer: Adam is an advanced optimization algorithm that combines two ideas:

  1. Momentum (which helps accelerate learning in the right direction).
  2. Scaling the learning rate for each parameter (which helps adapt the learning rate during training).

It uses two important things to update the parameters:

  • First moment estimate: The average of the past gradients (like momentum).
  • Second moment estimate: The average of the squared gradients (like scaling based on how large the gradient is).

These moments are used to update the parameters in a way that ensures faster convergence and avoids issues like overshooting or slow learning.

Mathematics Behind Adam Optimizer: Let’s go over the math involved in the Adam optimizer:

i. First Moment (m): This is the running average of the gradients.

where:

  • m_{t} is the updated first moment (mean of gradients).
  • \beta_1 is a constant (typically 0.9).
  • g_{t} is the gradient at time step tt.
  • m_{t-1} is the previous value of the first moment.

ii. Second Moment (v): This is the running average of the squared gradients.

Press enter or click to view image in full size

where:

  • v_t is the updated second moment (mean of squared gradients).
  • \beta_2 is a constant (typically 0.999).

iii. Bias Correction: Since m_{t} and v_{t} are initialized as zeros, the estimates are biased toward zero, especially during the early stages of training. To correct this, we apply bias correction:

where:

  • t is the current time step.
  • \hat{m_t} and \hat{v_t} are the bias-corrected first and second moments.

iv. Parameter Update: Once we have the corrected moments, the parameters (weights \theta) are updated as:

Press enter or click to view image in full size

where:

  • \theta_t is the current parameter (weight).
  • \alpha is the learning rate (typically 0.001).
  • \epsilon is a small constant to avoid division by zero (usually 1 \times 10^{-8}).

This update rule helps in adjusting the parameters by considering both the past gradients (momentum) and the squared gradients (adaptive learning rate). It allows Adam to make updates that are well-adjusted for each parameter, ensuring fast and stable training.

Why Use Adam: Adam is widely used because it adapts to different kinds of problems and can work well with sparse gradients (where only a few features are important). It is often more efficient and requires less tuning compared to other optimizers like gradient descent. Here’s why Adam is effective:

  1. It adapts the learning rate for each parameter.
  2. It considers both the momentum (past gradients) and the scaling of gradients (variance).
  3. It works well even with large datasets and non-stationary objectives.
class AdamOptimizer:
def __init__(self, lr=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8):
self.lr = lr
self.beta1 = beta1
self.beta2 = beta2
self.epsilon = epsilon
self.m_w = 0
self.v_w = 0
self.m_b = 0
self.v_b = 0
self.t = 0

def update(self, param, grad, m, v):
self.t += 1
m = self.beta1 * m + (1 - self.beta1) * grad
v = self.beta2 * v + (1 - self.beta2) * (grad ** 2)

m_hat = m / (1 - self.beta1 ** self.t)
v_hat = v / (1 - self.beta2 ** self.t)

param -= self.lr * m_hat / (np.sqrt(v_hat) + self.epsilon)
return param, m, v

9. Building the Encoder: In machine learning, particularly in neural networks like autoencoders, Encoders play a crucial role in transforming input data into a form that is easier for the model to understand and work with. The encoder takes the input data, processes it, and compresses it into a smaller, more abstract representation, typically referred to as the latent space or feature space. Let’s break down how the encoder works and understand its components step by step.

What Does the Encoder Do: The Encoder’s primary goal is to learn how to compress data (like an image, text, or any high-dimensional input) into a more compact and informative representation. This representation is smaller, easier to handle, and contains the most important features of the data. The Encoder passes the data through a series of transformations until it is encoded into this reduced representation.

Components of the Encoder:

i. Weights and Biases: These are parameters that the model learns during training to adjust how the data is transformed.

  • The weights control how each input feature affects the output.
  • The biases help shift the outputs to fit the data better.

In the code, the encoder uses these weights and biases during the forward pass to process the input data.

ii. Optimizer: The optimizer helps adjust the weights and biases to minimize the error during training. In the encoder, we use the Adam optimizer to do this efficiently.

Workflow of the Encoder: Now, let’s walk through the steps involved in the Encoder’s forward and backward passes:

a. Initialization:

  • The encoder is initialized with the input size, hidden size, and optimizer. It also initializes the weights (W_1) and biases (b_1) for the first layer, which will be used to process the data.
  • The initial moment estimates (used by the optimizer) are also set to zero.

b. Forward Pass:

  • During the forward pass, the encoder processes the input data (x) through a series of transformations:
  • Linear Transformation: First, the input data (x) is passed through a dense layer, which performs a linear transformation:

where:

  • W_1 are the weights,
  • b_1 is the bias,
  • x is the input data.
  • Activation Function (ReLU): The result of the linear transformation (z_1) is then passed through the ReLU activation function, which introduces non-linearity:

The ReLU function keeps positive values as they are, while it sets negative values to zero. This helps the model learn complex patterns.

The final output of the encoder (a_1) is the encoded representation of the input data, which contains the important features in a compressed form.

c. Backward Pass:

  • The backward pass is used to calculate the gradients (or derivatives) that tell the model how to update its parameters (weights and biases) to minimize the loss (error). The key steps are:
  • Compute Gradients: The encoder calculates the gradient of the loss with respect to the weights and biases. These gradients represent how much each weight and bias contributed to the error.
  • Derivative of ReLU: The encoder uses the derivative of the ReLU activation to adjust the gradients. This derivative tells us how the activation function behaves with respect to the input:
  • Calculate Gradients for Weights and Biases: The gradients are then used to compute how much the weights and biases need to change to reduce the error. This is done using the chain rule in calculus.

d. Update Weights and Biases:

  • After calculating the gradients, the optimizer (Adam) is used to update the weights and biases. Adam takes into account the past gradients (momentum) and the squared gradients (adaptive learning rates), helping the model update its parameters more efficiently.
  • The update formula for weights and biases is:

where:

  • W_1 is the weight matrix,
  • \alpha is the learning rate,
  • \hat{v_1} and m1^\hat{m_1} are the bias-corrected moment estimates from Adam.
class Encoder:
def __init__(self, input_size, hidden_size, optimizer):
self.input_size = input_size
self.hidden_size = hidden_size
self.optimizer = optimizer
self.W1, self.b1 = init_params(input_size, hidden_size)
self.m_w1 = self.v_w1 = np.zeros_like(self.W1)
self.m_b1 = self.v_b1 = np.zeros_like(self.b1)

def forward(self, x):
self.x = x
self.z1 = dense(x, self.W1.T, self.b1.T)
self.a1 = relu(self.z1)
return self.a1

def backward(self, y):
m = y.shape[0]
dz1_raw = np.dot(self.W1, self.x.T)
self.dz1 = dz1_raw.T * (self.a1 > 0)
self.dW1 = np.dot(self.dz1.T, self.x) / m
self.db1 = np.sum(self.dz1, axis=0, keepdims=True).T / m

def update(self):
self.W1, self.m_w1, self.v_w1 = self.optimizer.update(self.W1, self.dW1, self.m_w1, self.v_w1)
self.b1, self.m_b1, self.v_b1 = self.optimizer.update(self.b1, self.db1, self.m_b1, self.v_b1)

10. Building the Decoder: The Decoder is the second part of the autoencoder architecture. It takes the compressed and abstract representation generated by the Encoder and tries to reconstruct the original input data. The decoder’s job is to expand the compressed data back to its original form or some approximated version of it.

Components of the Decoder:

i. Weights and Biases: Like the encoder, the decoder has weights and biases that are learned during training. These parameters determine how the decoder reconstructs the data.

ii. Optimizer: The optimizer, in this case, the Adam optimizer, adjusts the weights and biases during training to minimize the reconstruction error.

Workflow of the Decoder: Now, let’s go step-by-step through the operations of the decoder:

i. Initialization:

  • When the Decoder is initialized, it receives the hidden size (size of the encoded representation) and output size (size of the original data) as input.
  • The decoder’s parameters, like weights (W_2) and biases (b_2), are initialized using the init_params() function.
  • The moment estimates for the optimizer are also initialized to zero.

ii. Forward Pass:

  • The forward pass takes the encoded representation (x) from the encoder and processes it to reconstruct the data.
  • First, the input data is passed through a dense layer (just like the encoder):

where:

  • W_2 is the weight matrix,
  • b_2 is the bias,
  • x is the input from the encoder (encoded data).
  • Next, the output of this linear transformation (z_2) is passed through a sigmoid activation function:

The sigmoid function squashes the output into a range between 0 and 1, which is typically suitable for binary data. For continuous data, sigmoid still helps by squashing the output between 0 and 1.

  • The final output (a_2) is the decoder’s reconstruction of the original data, which should be as close as possible to the input data.

iii. Backward Pass:

  • During the backward pass, the decoder computes the gradients of the loss with respect to its parameters (weights and biases). The steps involved are:
  • Compute the error (difference): The difference between the decoder’s output (a_2) and the actual target values (y) is calculated:

This error (\delta_2) tells us how far the predicted output is from the true values.

  • Calculate gradients: Using the error, we compute the gradients with respect to the weights (W_2) and biases (b_2):

where:

  • m is the number of samples,
  • \delta_2 is the error calculated above.
  • The bias gradient is computed as:

iv. Update Weights and Biases:

  • After calculating the gradients, the decoder updates its weights and biases using the Adam optimizer. This optimizer uses the gradients to adjust the parameters in a way that minimizes the reconstruction error.
  • The update for each parameter is done using the following formula:

where:

  • W_2 is the weight matrix,
  • \alpha is the learning rate,
  • \hat{m_2} and \hat{v_2} are the bias-corrected moment estimates from Adam.
class Decoder:
def __init__(self, hidden_size, output_size, optimizer):
self.hidden_size = hidden_size
self.output_size = output_size
self.optimizer = optimizer
self.W2, self.b2 = init_params(hidden_size, output_size)
self.m_w2 = self.v_w2 = np.zeros_like(self.W2)
self.m_b2 = self.v_b2 = np.zeros_like(self.b2)

def forward(self, x):
self.x = x
self.z2 = dense(x, self.W2.T, self.b2.T)
self.a2 = sigmoid(self.z2)
return self.a2

def backward(self, y):
m = y.shape[0]
self.dz2 = self.a2 - y
self.dW2 = np.dot(self.dz2.T, self.x) / m
self.db2 = np.sum(self.dz2, axis=0, keepdims=True).T / m

def update(self):
self.W2, self.m_w2, self.v_w2 = self.optimizer.update(self.W2, self.dW2, self.m_w2, self.v_w2)
self.b2, self.m_b2, self.v_b2 = self.optimizer.update(self.b2, self.db2, self.m_b2, self.v_b2)

Building the Autoencoder

To create the Autoencoder, we combine three core components: the Encoder, the Decoder, and the Optimizer. These components work together to compress the input data into a latent space representation and then reconstruct the data from that compressed form.

Press enter or click to view image in full size
Figure 6: Assembling the Autoencoder

The Encoder is the first part of the Autoencoder. It takes the input data and transforms it into a lower-dimensional space, which is known as the latent space. It achieves this through a dense layer followed by a ReLU activation. The weights and biases in this layer are initialized using a method (init_params), which helps in the transformation process. The encoder's main task is to learn a compressed representation of the data. The forward pass of the encoder calculates the linear transformation of the input, and the activation function (ReLU) introduces non-linearity.

The Decoder, on the other hand, takes this compressed representation from the encoder and reconstructs it to approximate the original input. It also uses a dense layer, but the activation function here is sigmoid, which helps in bringing the output back to the same range as the original input data. The decoder’s forward pass computes the output by performing a linear transformation of the encoder’s output and applying the sigmoid activation.

For optimizing the learning process, we use the AdamOptimizer. The optimizer helps in adjusting the weights and biases of both the encoder and decoder by minimizing the reconstruction error. The Adam optimizer is efficient for training deep networks and handles the weight updates based on the gradients computed during backpropagation. The optimizer uses momentum and adaptive learning rates, making it faster and more stable during training compared to traditional stochastic gradient descent.

Training the Autoencoder

Training the autoencoder involves several steps where the model learns to minimize the reconstruction error between the input data and the reconstructed output. During training, the autoencoder is fed with batches of data, and the model computes the forward and backward passes in each iteration.

The forward pass involves passing the input through the encoder to get the compressed representation and then through the decoder to get the reconstructed output. The loss function (binary cross-entropy in this case) measures the difference between the input and the reconstructed data. This loss guides the backpropagation process.

In the backward pass, we compute the gradients of the loss with respect to the weights and biases of both the encoder and decoder. These gradients are then used to update the weights and biases using the Adam optimizer. The optimizer adjusts the parameters to reduce the reconstruction error over time.

The training process repeats for multiple epochs, where in each epoch, the model is trained over batches of data. After every epoch, the average loss and the accuracy are printed to monitor the progress. The accuracy is computed to show how well the model is reconstructing the input.

At the end of training, the autoencoder learns to compress the input data and reconstruct it accurately. The performance of the model is tracked through the loss and accuracy metrics plotted over the epochs, providing a clear visualization of how the model improves during training.

class Autoencoder:
def __init__(self, input_size, hidden_size, optimizer):
self.encoder = Encoder(input_size, hidden_size, optimizer)
self.decoder = Decoder(hidden_size, input_size, optimizer)
self.optimizer = optimizer
self.loss_history = []
self.total_params = self.calculate_total_params()

def calculate_total_params(self):
encoder_params = np.prod(self.encoder.W1.shape) + np.prod(self.encoder.b1.shape)
decoder_params = np.prod(self.decoder.W2.shape) + np.prod(self.decoder.b2.shape)
return encoder_params + decoder_params

def summary(self):
# Header
print("------------------------------------------------------------")
print(f"{'Layer (Type)':<20} {'Output Shape':<20} {'Param #':<10}")
print("============================================================")

# Encoder details
encoder_params = np.prod(self.encoder.W1.shape) + np.prod(self.encoder.b1.shape)
print(f"Encoder (Dense):{'':<9} ({self.encoder.hidden_size},) {encoder_params:<10}")

# Decoder details
decoder_params = np.prod(self.decoder.W2.shape) + np.prod(self.decoder.b2.shape)
print(f"Decoder (Dense):{'':<9} ({self.decoder.output_size},) {decoder_params:<10}")

# Footer
print("============================================================")
print(f"Total Parameters: {self.total_params}")
print(f"Trainable Parameters: {self.total_params}")
print(f"Non-trainable Parameters: 0")
print("------------------------------------------------------------")

def forward(self, x):
self.encoded = self.encoder.forward(x)
self.decoded = self.decoder.forward(self.encoded)
return self.decoded

def backward(self, y):
self.decoder.backward(y)
self.decoder.update()
self.encoder.backward(y)
self.encoder.update()

def train(self, x, y, epochs, batch_size, threshold=0.1):
m = x.shape[0]
self.accuracy_history = []

for epoch in range(epochs):
epoch_loss = 0
all_predictions = []
all_true_values = []

for i in range(0, m, batch_size):
x_batch = x[i:i + batch_size]
y_batch = y[i:i + batch_size]

self.forward(x_batch)
self.backward(y_batch)

batch_loss = bce_loss(y_batch, self.decoded)
epoch_loss += batch_loss

all_predictions.append(self.decoded)
all_true_values.append(y_batch)

avg_epoch_loss = epoch_loss / (m // batch_size)
self.loss_history.append(avg_epoch_loss)

all_predictions = np.vstack(all_predictions)
all_true_values = np.vstack(all_true_values)

accuracy = calculate_accuracy(all_true_values, all_predictions, threshold)
self.accuracy_history.append(accuracy)

print("-----------------------------------------------")
print(f'Epoch {epoch + 1}/{epochs} - Loss: {avg_epoch_loss:.4f}, Accuracy: {accuracy}%')

self.plot_metrics()


def predict(self, x):
return self.forward(x)

def evaluate(self, x, y):
predictions = self.predict(x)
loss = bce_loss(y, predictions)
return loss

def plot_metrics(self):
plt.figure(figsize=(10, 5))
plt.plot(self.loss_history, label="Training Loss", color='blue')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss Over Epochs')
plt.legend()
plt.grid()
plt.show()

plt.figure(figsize=(10, 5))
plt.plot(self.accuracy_history, label="Training Accuracy", color='orange')
plt.xlabel('Epoch')
plt.ylabel('Accuracy (%)')
plt.title('Training Accuracy Over Epochs')
plt.legend()
plt.grid()
plt.show()




learning_rate = 0.001
batch_size = 64
epochs = 300

optimizer = AdamOptimizer(lr=learning_rate)
autoencoder = Autoencoder(784, 256, optimizer)
autoencoder.summary()
autoencoder.train(train_data, train_data, epochs, batch_size)

Output

After training the autoencoder for 300 epochs, the model achieved a loss of 0.0868 and an accuracy of 90.2%, demonstrating significant progress. Initially, in the first epoch, the loss was 0.3840, and the accuracy was 21.5%, indicating that the model was just beginning to learn and adapt to the data. As the training continued, the optimizer gradually adjusted the weights, reducing the error and increasing the model’s accuracy. This shows how the model improved its predictions over time, reflecting the effectiveness of the training process and the optimizer in minimizing the loss and improving the performance.

Figure 7: Epoch 300
Press enter or click to view image in full size
Figure 8: Training Loss and Accuracy Graph

Testing the Model

Testing the model involves evaluating its performance on the test data and visualizing how well the autoencoder can reconstruct the input images. First, we calculate the test loss using the evaluate method, which computes the loss between the original test images and their reconstructed versions. The loss provides a measure of how well the model generalizes to unseen data.

After evaluating the loss, we visualize the results by displaying a set of 10 images. For each image, we show three parts:

  1. Original Image: This is the input image from the test dataset.
  2. Encoded Image: This represents the encoded feature map, where the autoencoder compresses the image to a smaller size. It’s the output after passing the image through the encoder layer.
  3. Reconstructed Image: This is the final output after the decoder reconstructs the image from the encoded features.

The process of displaying the images side by side helps to compare how well the autoencoder is able to learn and reconstruct the images, providing an intuitive understanding of the model’s performance. The closer the reconstructed images are to the original images, the better the model has learned to encode and decode the data.

# Testing the model
test_loss = autoencoder.evaluate(test_data, test_data)
print(f'Test Loss: {test_loss}')

n = 10
plt.figure(figsize=(20, 6))

for i in range(n):
# Original Image
ax = plt.subplot(3, n, i + 1)
plt.title("Original")
plt.imshow(test_data[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

# Encoded Image
encoded_data = autoencoder.encoder.forward(test_data)
ax = plt.subplot(3, n, i + 1 + n)
plt.title("Encoded")
plt.imshow(encoded_data[i].reshape(16, 16))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

# Reconstructed Image
reconstructed_data = autoencoder.predict(test_data)
ax = plt.subplot(3, n, i + 1 + 2 * n)
plt.title("Reconstructed")
plt.imshow(reconstructed_data[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

plt.tight_layout()
plt.show()
Press enter or click to view image in full size
Figure 9: Output

Conclusion

In conclusion, this autoencoder model demonstrates the power of unsupervised learning in compressing and reconstructing data. By effectively utilizing encoder-decoder architecture and leveraging the Adam optimizer, we’ve built a robust model that not only achieves impressive accuracy but also showcases the potential of neural networks in handling complex data. The journey from training with a loss of 0.38 to achieving 90% accuracy highlights the model’s capability to learn and improve over time, making it a valuable tool for tasks like data denoising, anomaly detection, and dimensionality reduction.

--

--

Akash Nath
Akash Nath

Written by Akash Nath

AI/ML geek building cool stuff | 4x Hackathon Winner | Building SaaS products

No responses yet