Coders Packet

MNIST Handwritten digits classification using SVM and sklearn

By Vedant Keshav Jadhav

A machine learning project using sklearn Python library to classify images of handwritten digits, respectively to their classes, ranging from 0 to 9. The classification model used here is Support Vector Machine (SVM).

MNIST handwritten digits classification is one of the basic problems in Deep learning fields. We are given a dataset of 70000 images of 28 x 28 size (784 pixels). The dataset can be imported from scikit-learn library using the openml module. In this dataset, the images are already flattened out, that is, their shape is (70000, 784) instead of (784, 28, 28). An image is just an array of pixels, with each pixel representing an element in array.

The image is grayscale, i.e, it will be only 2D with values ranging from 0 to 1 and will look only black and white. For example, 9 will look like -

For the classification, we use SVM algorithm, from scikit-learn Python library.

First, we split the dataset into training and testing data, along with the labels. The training data is the one with which our model learns. The testing data is the one on which we test our data. If we get high accuracy on both datasets, we say our model is best fit. 

After splitting the dataset, we fit the training data in the SVM model. sklearn library provides the functionality to fit the dataset to our model without much hassle. Once completion, we test the data on the same model and get an array of predictions.

On comparing the predictions, with the original labels for the testing data, we get to know the accuracy of our model, that is, how correctly it performs on unseen data. In our case, the accuracy is > 97%

The code is written in jupyter notebook, which provides a better framework for machine learning-related projects.

Download Complete Code


No comments yet