A Hands-On Guide to Your First Deep Learning Project

An introductory hands-on approach to deep learning

Mar 24, 2023

Deep learning, a subset of machine learning, has made significant strides in recent years by solving complex problems in computer vision, natural language processing, and speech recognition. Deep learning utilizes artificial neural networks to mimic the human brain's ability to learn and process data. In today's issue we will be creating a hands on neural network to recognize handwritten digits.

The Problem

Let's begin with a simple and concise definition of our question/problem, inputs and outputs.

Recognize handwritten digits from a given image containing one handwritten single digit from 0 to 9. Output a label representing the digit found on the image.

Input: An image containing one handwritten single digit from 0 to 9.

Output: A label representing the digit from 0 to 9.

The Solution

There are a wide range of libraries and programming languages available for deep learning. One of the most popular combinations are Python and Keras on TensorFlow. TensorFlow is a powerful open-source library for machine learning and deep learning applications.

To follow along I recommend using Google Colab, however it is also possible to execute everything on your own machine provided you have python3 and the necessary libraries installed.

The input data

The success of our deep learning model is tied to the input data. If you are not yet familiar with datasets/data and their importance I recommend you read my previous article: Fueling Machine Learning with Data.

We will use the popular MNIST dataset as the input (or fuel) to our deep learning neural network.

Figure 1 - MNIST dataset. Source Wikipedia.

The MNIST dataset contains 70 thousand images of 28 by 28 pixels. And each of these images has already been labeled. Creating such a dataset from scratch will require a lot of time and effort!

The machine learning code

Without further ado let’s look at the code needed to train our deep learning model:

import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
loss, accuracy = model.evaluate(x_test, y_test)
print('Test accuracy:', test_accuracy)

While this code is not novel and has already been used to introduce deep-learning in the past, it serves as a good starting point to learning the basic concepts.

It’s remarkable that with less than 10 lines of code we have created a neural network capable of predicting a 0-9 digit from an input image. This is possible because Keras and Tensorflow are providing a very convenient interface with a higher level of abstraction, allowing us to focus on essential aspects without delving into the details of the underlying implementation.

Implementation details

Let's take a closer look at each line of code. I will break the code into 6 steps and give more details while introducing the new concepts.

Step 1: Importing libraries

import tensorflow as tf

Google Colab comes with the tensorflow library already installed. However If running on your own machine you may need to install the tensorflow library using pip.

Step 2: Loading the dataset

mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()

The MNIST dataset is readily available through the TensorFlow library. We just need to load it. The dataset consists of 70,000 grayscale images of handwritten digits (0 to 9) and their corresponding labels. The dataset is split into 60,000 training images and 10,000 test images.

Above couple of lines of code load the training images (x_train), labels (y_train) and test images (x_test), labels (y_test) from MNIST dataset.

Step 3: Create the neural network layers

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')

])

Sequential function groups a stack of layers into a model. Our model definition consists of three layers:

Input layer: With the input layer we are telling Keras about the shape of our input data. We use the "Flatten" input layer that converts the 2D 28x28 pixel images into a 1D array of 784 pixels.
Hidden layer: a "Dense" layer with 128 neurons and a ReLU activation function. Any layer added between input and output layer is called Hidden layer. Deep learning uses the term neurons as we are building a layer of these so-called neurons that are all interconnected with the input and output in a layered structure that resembles the human brain. Each neuron uses a ReLU activation function, which returns 0 if the input is negative, else it returns that value back.
Output layer: a "Dense" layer with 10 neurons (one for each digit from 0 to 9) and a softmax activation function to output probability scores. This is the final layer in the neural network that outputs the desired prediction.

To present this more visually:

Figure 2 - Image digit recognition neural network

Step 4: Compile model

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Compile the model by specifying the optimizer, loss function, and evaluation metric. Let me provide definitions for each of these new concepts:

The loss function measures the difference between the model's predictions and the actual training data label for any given input. In other words, the loss function helps us measure how well our model is predicting the digits.
The optimizer is the algorithm used to adjust the model's parameters to minimize the loss function, improving the model's performance.
Finally the evaluation metric helps determine how well the model's predictions align with the actual output, providing valuable insights into the model's effectiveness.

I hope these definitions serve as a suitable introduction for now. I will introduce these concepts in more detail in later posts.

Step 5: Training the model

model.fit(x_train, y_train, epochs=5)

Fitting the model is the process of training a model on a dataset. The model learns to make predictions by adjusting its internal values using the optimizer and loss function.

An epoch is one complete pass through the entire training dataset during the fitting process. Multiple epochs are used to improve the model's learning.

In summary, 'fit' is the training process, and an 'epoch' is a full cycle through the dataset during this process.

Step 6: Evaluating the model

loss, accuracy = model.evaluate(x_test, y_test)

print('Test accuracy:', accuracy)

To evaluate the model's performance we need to use unseen data (data not used to train the model). That is why we use the test images and labels that were not used to train the model in step 5.

The model.evaluate returns 2 values:

loss: The first value in the output is the loss value, which represents how well the model's predictions match the actual output (or target) for the test dataset. A lower loss value generally indicates better model performance.

accuracy: The second value in the output is the evaluation metric, in this case, accuracy. Accuracy measures the proportion of correct predictions out of the total predictions made by the model. A higher accuracy value indicates better model performance. Note that this is just an example, and other metrics could be used.

For our example we are getting a test accuracy of 95%:

Test accuracy: 0.9466000199317932

As a rule of thumb any accuracy >0.7 is good. And accuracy >0.9 is excellent.

Next Steps

Once we train the model the next steps involve storing the model and using the model as part of a production system to make predictions. Let’s review how to accomplish these tasks.

Saving the model is done by simply calling save function:

model.save('my_mnist_model')

Loading the model and making predictions:

from tensorflow.keras.models import load_model

my_mnist_model = load_model('my_mnist_model')

Finally, making predictions using the model:

image_index = 0  # You can choose any index between 0 and x_test length

single_image = x_test[image_index]

single_label = y_test[image_index]

input_image = single_image.reshape(1, 28, 28)  # Reshape the image to (1, 28, 28)

prediction = my_mnist_model.predict(input_image)

predicted_label = np.argmax(prediction)

print(predicted_label)

We will create a couple of functions to help us visualize our results:

import matplotlib.pyplot as plt

def plot_image(img):

    plt.grid(False)

    plt.xticks([])

    plt.yticks([])

    plt.imshow(img, cmap=plt.cm.binary)

def plot_value_array(predictions_array, true_label):

    plt.grid(False)

    plt.xticks([])

    plt.yticks([])

    predicted_label = np.argmax(predictions_array)

    color = 'blue' if predicted_label == true_label else 'red'

    plt.text(0.5, 0.75, f"Predicted: {predicted_label}", fontsize=14, ha='center', color=color)    

    plt.text(0.5, 0.5, f"Expected: {true_label}", fontsize=14, ha='center', color=color)

    plt.text(0.5, 0.25, f"Accuracy: {np.max(predictions_array) * 100:.2f}%", fontsize=14, ha='center', color=color)

plt.figure(figsize=(6, 3))

plt.subplot(1, 2, 1)

plot_image(single_image)

plt.subplot(1, 2, 2)

plot_value_array(prediction[0], single_label)

plt.show()

The results you should see after running this is:

If you fallowed along then you have successfully built and trained your first deep learning model to recognize handwritten digits using the MNIST dataset. There is more to this of course, we are just scratching the surface here. And I hope you gained some insights into the basics of deep learning today!

On The Road To AI

Discussion about this post