Hand-written digit detection in Python

Handwritten digit detection in Python

What is Handwritten Digit Recognition?

Handwritten digit recognition refers to the capability of computers to identify human handwritten numbers. This task is challenging for machines because handwritten digits are often imperfect and can vary significantly in appearance. Handwritten digit recognition addresses this issue by analyzing an image of a digit and determining the number depicted in the image.

Building a Python Project

Below are the steps to implement the handwritten digit recognition project:

1. Import the libraries

The following are the imported libraries:

  • os : A module for interacting with the operating system, enabling file and directory management.
  • cv2 : A library for computer vision that facilitates image and video processing.
  • numpy: A fundamental package for numerical computations, beneficial for handling arrays and matrices.
  • matplotlib.pyplot: A plotting library for creating visualizations such as charts and graphs.
  • tensorflow : An open-source library for machine learning and neural networks, used to build and train models. Here is the documentation
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

2. Load the dataset

The keras library consists of datasets and MNIST is one of them. The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples.

mnist = tf.keras.datasets.mnist
X_train,y_train), (X_test,y_test) = mnist.load_data()

3. Preprocess the data

Here, the function scales the pixel values in each image of the dataset. By normalizing along axis=1, it ensures that each image’s pixel values are adjusted to a standard range, making them easier for the model to process. This helps improve the model’s performance and training speed.

X_train = tf.keras.utils.normalize(X_train, axis=1)
X_test = tf.keras.utils.normalize(X_test, axis=1)

4. Create the model

We are defining a neural network model using TensorFlow’s Keras API.
Initially, create a new, empty neural network model.
Flatten the Input : this layer reshapes the 28×28 pixel images into a 1D array of 784 values (28*28=784). This is necessary because dense layers expect 1D input.
First Dense layer : a fully connected (dense) layer with 128 neurons. The activation function relu (Rectified Linear Unit) helps the network learn complex patterns by introducing non-linearity.
Second Dense Layer : another dense layer with 128 neurons, also using the relu activation function, to further process the data and learn more complex features.
Output Layer : final dense layer has 10 neurons, corresponding to the 10 possible digit classes (0-9). The softmax activation function converts the outputs to probabilities, making it suitable for multi-class classification.

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(28,28)))
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.Dense(10, activation='softmax'))

Following is the model architecture:

model architecture

 

 

 

 

 

 

 

 

5. Train the model

The code compiles the neural network model using the Adam optimizer and sparse categorical cross entropy loss, trains it on the training data for 6 epochs, and saves the trained model to a file named ‘handwritten.h5’. The results are attached below.

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=6)
model.save('handwritten.h5')

Training results

6. Evaluate the model

We hold a dataset of 10,000 images to assess our model’s performance. This testing data, separate from the training set, presents novel challenges for our model to handle. The results are attached below with 97 percent accuracy.

loss, accuracy = model.evaluate(X_test, y_test)

print(loss)
print(accuracy)

evaluation results

7. Predict the digits

Here, we can either use the manually written digit’s images or we can use the paint application in the PC and rename these images as following.
file structure

The following Python script loads the pre-trained neural network model for handwritten digit recognition from the file named ‘handwritten.h5’. It iterates over a series of digit images (named digit1.png, digit2.png, etc.) stored in a directory called ‘digits’. For each image, it preprocesses the image, makes a prediction using the loaded model, displays the predicted digit along with the corresponding image, and increments the image number for the next iteration. Some results are attached below the code.

model = tf.keras.models.load_model('handwritten.h5')

img_num =1
while os.path.isfile(f"digits/digit{img_num}.png"):
  try:
    img = cv2.imread(f"digits/digit{img_num}.png")[:,:,0]
    img = np.invert(np.array([img]))
    prediction = model.predict(img)
    print(f"The digit is identified as {np.argmax(prediction)}")
    plt.imshow(img[0], cmap=plt.cm.binary)
    plt.show()
  except:
    print("error")
  finally:
    img_num+=1

prediction results 1 prediction results 2

prediction results 3      prediction results 4

We have successfully built a Python deep learning project on handwritten digit recognition.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top