Coders Packet

Recognition of hand written digits using CNN in Python

By Shikhar Mishra

Prediction of hand written digits value using CNN model trained via MNIST dataset using keras(tenserflow as the backend)

Our task is to perfectly identify the digits from a dataset of tens of thousands of handwritten sketches using Keras ( Tenserflow as the backend ). I
have chosen to build it with keras API ( Tensorflow backend ). At first, I will start with preparation of the data ( handwritten digits pics ) then we will work on the CNN modeling and evaluation part.

For computational issues, just place value the epochs to 1, but if you want to accomplish 99+% of accuracy set it to more than 20.

This Notebook follows three main parts:

⦿ The data preparation
⦿ The CNN modeling
⦿ Data Augmentation
⦿ Evaluation of model
⦿ The results prediction 

The Data preparation consists of following subparts:

Loading of the dataset: The MNIST ( Modified National Institute of Standards and Technology ) database contains 60,000 training images and 10,000 testing images. Each sketch is a grayscale image of size 28 x 28, which representing the digits 0-9. This dataset is available on the internet and can be easily downloaded.

Normalization : We performed the grayscale normalization so as to reduce the effect of illumination and it's distinction. Moreover, the CNN converges faster on [0..1](smaller) data than on [0..255](larger).

Reshaping of the data : Now, the train and test images (28px X 28px) passed intothe pandas to convert it into a DataFrame of 1D vectors which contain 784 values. We then reshape all the data to size of 28X28X1 3D matrices.

Label encoding: Labels are 10 digits numbers from 0 to 9 in a dependent variable. We now should encode these labels into one-hot vectors.

Splitting of training and validation set: I have chosen to split the train set in two parts : a small fraction (0.1) became the cross-validation set which will be used multiple times in evavluation and the rest (0.9) is used to obtain the weights via training the CNN model.

The CNN modeling and evaluation was the most significant step which contains:

Definition of model: My CNN architecture is In -> [[Conv2D->relu] X 2 => MaxPool2D -> Dropout] X 2 => Flatten -> Dense -> Dropout -> Out. Now, we should proceed with the most necessary task of combining convolutional and pooling layers(former ones), CNN is now able to acquire those local features and learn more global features of the image.

Dropout is a process of dropping the randomly selected proportion of the neural network which forces the network to learn features in a more distributed and advanced way which in turn improves generalization, hence, it helps in reducing the overfitting. The rectifier(“RELU”) activation function is used to add curvilinearity to the decision boundary of the network.

The Flatten layer, which is used to convert the final feature, it maps into a one single 1D vector which is formed after combining all the local features(found out) of the previous convolutional layers of the network. In the last two fully-connected layers(Dense(10, activation="softmax")) the total outputs the distribution of probability of each class.

To set the optimizer and annealing of the learning rate: I have chosen RMSprop (with default values), as it is a very impressive optimizer. The RMSProp update in each iteration adjusts the Adaptive gradient method so as to reduce its aggressive, monotonically decreasing learning rate. I could have used Stochastic Gradient Descent("sgd")optimizer, but it is relatively slower one.

Now, so as to make the optimizer converge faster and closest to the global minimum of the loss function, I utilized the annealing technique of the learning rate (LR). Employ ReduceLROnPlateau function as an annealing function of LR from Keras.callbacks, I have decided to decrease the LR by half of it's value if and only if the accuracy is not improved each way after 3 epochs.

To keep way from overfitting issue, we need to increase synthetically our handwritten digit sketch dataset where we need to know what ‘Data Augmentation’ is.

Using the data augmentation, I selected to :

⦿ Randomly rotate some training images by 10 degrees
⦿ Randomly Zoom by 10% some training images
⦿ Randomly shift images horizontally by 10% of the width
⦿ Randomly shift images vertically by 10% of the height

For the Evaluation of the model, we printed the confusion-matrix and investigated the errors using a instantly developed function ‘display_errors’. The most important errors were also the most intrigued and can be easily be guessed.

Finally, we predicted the result in a one-hot encoding manner via argmax() function of numpy which select the indices with maximum probability.  

Download Complete Code

Comments

No comments yet