Content-Based-Image-Retrieval

CBIR refers to the process of obtaining images that are relevant to query image from large collection of images based on their visual content.

CBIR refers to the process of obtaining images that are relevant to query image from large collection of images based on their visual content. Basically, we extract features of the query image using trained Convolutional Neural Network and then rank all the images from a large collection based on cosine similarity with the feature representation of these images.

Total 2000 images were present, 1500 were used for training and 500 for testing.

Two different CNN models were defined. The first model is a part of VGG16 model with inputs shape = (100,100,3) and output shape = (6,6,512). All the 1500 training images were fed into the first model to extract primary features of shape (1500,6,6,512). The arcitecture of first model is as follows:-

Model-1 Architecture

The second model is again a CNN model with input shape = (18,18,512) and output shape = (512). All the primary feature of training images extracted from first model was fed to second model to extract the final or secondary features of shape (1500,512). The architecture of second model is as follows:-

Model-2 Architecture

The secondary or the final features extracted from the second model is used for defining loss function ans training the model. Only the last layer of the second model is set to trainable. A random number say n is selected and for a certain image say ith image, n images from the training set except the ith image whose features are nearest to the feature of this ith image are taken. Now mean feature is calculated by taking mean of all the features of n selected images. Dot product is taken between the features of ith image and mean features of the n nearest images. Similarly, dot product is taken between the rest of the images and their n nearest images and finally the average of all these is used as loss function. Our goal is to maximise this loss function.

Let x be the feature representation of ith image and u be the mean vector of n nearest feature representations.

Loss function is defined as:-

Loss function

The optimisation problem above is solved using gradient descent:

Gradient

The update rule for vth iteration is given by:

Update rule

Top-N score was used as evaluation metrics. Top-N score referes to the average number of same-object images within top-N ranked images.

Coders Packet

Content-Based-Image-Retrieval

Comments