Multiple Image Classification in Python using TensorFlow

In this project, we will learn how to classify different objects using neural networks made from scratch. We will classify five furniture objects.

Image classification is a very common problem statement that can be solved with the help of neural networks. There are various ways of creating our model. We can either make use of pre-existing models such as VGG-16, ImageNet etc or create a model from scratch. For this article, we will create a model from scratch. This will help gain a better understanding of the problem statement and also be easier to implement. The pre-existing models have a large number of layers and can be unnecessarily complex and cumbersome to train.

Dataset

The dataset we are going to be looking at consists of five different types of furniture objects - sofa, chair, bed, table and swivelchair. You can access the dataset from this Kaggle link.

Installing Dependencies

Before we start creating and training our model, let us first look at the dependencies that are required for this project. We will need TensorFlow, OpenCV and a couple of other libraries to work with. To install them, go to your terminal/command prompt and run the following instructions:

pip install tensorflow
pip install opencv-python
pip install matplotlib

Additionally, you may also have to install NumPy:

pip install numpy

There! We now have all the dependencies required to execute our code. Let us now start the coding by first importing all the libraries we need.

Importing Libraries

import numpy as np
import tensorflow as tf
import os
import cv2
from tensorflow.keras import layers
import matplotlib.pyplot as plt

Next, we need to focus on our dataset. You can view an image from the dataset to get a brief idea of what you're dealing with. Execute the code snippet given below to look at the image:

furniture = cv2.imread('') # Path to the image you want to view
plt.imshow(furniture, cmap='binary', interpolation='nearest')
plt.axis('off')

Preprocessing the images and preparing the dataset

We need to prepare our dataset and have uniformity amongst the images so that it is easy to feed them into the neural network and avoid errors that could arise from differences in dimensions, channels etc.

Ideally, we want all images to be of the same size and have the same number of color channels, that is, RGB or Grayscale. We can create a function for all these requirements so that we can access it whenever we need to. The function should look something like this:

# Preprocessing function
def preprocess(cat, split, label):
    train_images = []
    train_labels = []
    for i in os.listdir(cat):
            image = cv2.imread(cat + '/' + i) # Reading image as a pixel values matrix
            res = cv2.resize(image, dsize=(128,128), interpolation=cv2.INTER_CUBIC) # Resize
            gray = cv2.cvtColor(res, cv2.COLOR_BGR2GRAY) # Converting to grayscale
            train_images.append(gray)
            train_labels.append(label)
            
    size = len(train_images)
    return train_images[int(split*size):], train_images[:int(split*size)], train_labels[int(split*size):], train_labels[:int(split*size)]

We convert the images to grayscale images of size 128 by 128. Converting to grayscale will help train the model better as it becomes easier to identify. Also, the 128 by 128 image size is a large enough size for the model to work on. You could choose to go with bigger or smaller sizes, depending on how much time and computational power you're willing to spend.

Before we understand how the function works, we must know the directory structure of the dataset. It looks something like this:

->Train
  ->Bed
  ->Sofa
  ->Chair
  ->Swivelchair
  ->Table

Now that we know the directory structure, let us understand what the function does. There are 3 parameters, 'cat', 'split' and 'label'.

The 'cat' parameter takes the file path to a particular object directory as an input. Then, a loop is run to access all images of that particular object and the image information is extracted. The 'split' parameter lets us decide how large we want our validation set to be. It takes values between 0 and 1. Lastly, the 'label' parameter is used to manually assign labels to our objects. For instance, if I am in the chair folder and want to label all my chair images as '2', I can just input 2 in place of 'label'.

The code snippet below will give us a better understanding:

# Preparing train and test sets
train = []
test = []
labeltrain = []
labeltest = []
# Bed
train_images, test_images, train_labels, test_labels = preprocess('../input/day-3-kaggle-competition/data_comp/data_comp/train/bed', 0.2, 0)
train.extend(train_images)
test.extend(test_images)
labeltrain.extend(train_labels)
labeltest.extend(test_labels)

# Chair
train_images, test_images, train_labels, test_labels = preprocess('../input/day-3-kaggle-competition/data_comp/data_comp/train/chair', 0.15, 1)
train.extend(train_images)
test.extend(test_images)
labeltrain.extend(train_labels)
labeltest.extend(test_labels)

# Sofa
train_images, test_images, train_labels, test_labels = preprocess('../input/day-3-kaggle-competition/data_comp/data_comp/train/sofa', 0.15, 2)
train.extend(train_images)
test.extend(test_images)
labeltrain.extend(train_labels)
labeltest.extend(test_labels)

# Swivelchair
train_images, test_images, train_labels, test_labels = preprocess('../input/day-3-kaggle-competition/data_comp/data_comp/train/swivelchair', 0.1, 3)
train.extend(train_images)
test.extend(test_images)
labeltrain.extend(train_labels)
labeltest.extend(test_labels)

# Table
train_images, test_images, train_labels, test_labels = preprocess('../input/day-3-kaggle-competition/data_comp/data_comp/train/table', 0.15, 4)
train.extend(train_images)
test.extend(test_images)
labeltrain.extend(train_labels)
labeltest.extend(test_labels)

train = np.array(train)
test = np.array(test)
labeltrain = np.array(labeltrain)
labeltest = np.array(labeltest)

train = train.reshape(train.shape[0], 128, 128, 1).astype('float32')
train = (train - 127.5) / 127.5 # Normalize the images to [-1, 1]

test = test.reshape(test.shape[0], 128, 128, 1).astype('float32')
test = (test - 127.5) / 127.5 # Normalize the images to [-1, 1]

As is evident from the code snippet, we've assigned 5 different labels to the 5 different object classes that we'll be dealing with.

Now that we've finished preparing the dataset, we can start designing our model.

Designing and training the model

The general consensus is that convolutional neural networks work best when dealing with images. Hence, we will design a neural network consisting of convolutional layers. Trying to add too many layers to the model is not necessarily the correct thing to do. We will stick with only a few layers. Nothing overly fancy. A repetition of a combination of Conv2D and MaxPooling2D layers followed by a few Dense layers. We make use of TensorFlow's Keras API to make the model. It is as follows:

# Model
model = tf.keras.models.Sequential([
    layers.Conv2D(16, (3,3), activation='relu', input_shape=(128, 128, 1)),
    layers.MaxPooling2D(2,2),
    layers.Conv2D(32, (3,3), activation='relu'),
    layers.MaxPooling2D(2,2),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D(2,2),
    layers.Conv2D(32, (3,3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.0028)),
    layers.MaxPooling2D(2,2),
    layers.Dropout(0.75),
    layers.Flatten(),
    layers.Dense(1024, activation='relu'),
    layers.Dense(512, activation='relu'),
    layers.Dense(128, activation='relu'),
    layers.Dense(64, activation='relu'),
    layers.Dense(5, activation='softmax')])

model.summary() # Displays the network architecture and the input/output dimensions for each layer

Notice how I've added a dropout layer and added regularization. There are no fixed values and I felt that these worked best for the model. Feel free to experiment by changing the number of neurons and types of activations used, etc.

We're almost there! Now all that's left to do is to train our model. We use the well-known Adam optimizer for the training weights and categorical cross-entropy loss function as we're dealing with multiple classes (five classes). Also, it is advisable to train the model on a GPU as it will take some time on a CPU. If you have tensorflow gpu, that's great. If not, you can run this code on cloud GPUs like Kaggle or Google Colab. The model below took about a minute to complete 30 epochs. Feel free to experiment with the number of epochs and other parameters until you get a good result. Run this code snippet to start the training:

model.compile(optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'])


history = model.fit(train, labeltrain, epochs = 30,
                    validation_data=(test, labeltest), verbose=2)

That's it! Our model has been trained successfully and it can now start predicting what type of furniture is in the image. A training accuracy of above 95% was obtained and a validation accuracy of close to 90%.

Coders Packet