Coders Packet

Semantic Segmentation

By Harsh Kumar

The project is on Semantic Segmentation suign Python where we assign each pixel of image to certain class.

In semantic segmentation each pixel is assigned to a certain class and pixels of the same class have similar characteristics. Semantic Segmentation basically works using encoder and decoder. The immage is first fed to encoder which downsamples the image and extracts features from it. Then this feature map is fed to decoder which upsamples the features and generates a pixel wise label map.

Encoder Decoder

- Encoder

 It is a CNN without fully connected layers and aggregates low level features to high level features.

- Decoder

Replaces fully connected layers in a CNN and upsamples image to original size to generate a pixel mask.


Fully convolutional Neural Networks has been used which contains only convolutional and pooling layers.

FCN architecture


In the encoder part there are 5 blocks and each block has 2 convolutional layers followed by a pooling layer. To create functional API has been used. 


In the block function we first loop through number of convolutions with filters and kernel size as argumets and then it is followed by pooling layer with pool size and pool stride as arguments.



Encoder code

Inside the encoder function the block function is called for 5 times each having different number of filters. We have used relu activation function.


Decoder code


In the argument of the decoder function we receive all the layers after the pooling from encoder. Conv2d transpose layer upcsales the output from bottleneck layer which adds height and width to it. Cropping help us to get back to the right size. Then we apply convolution to f4 using 1 by 1 filter just to get right number of classes and then add this to upsampled layer. This process continues till last Conv2D transpose layer where we have used kernel size of 8 to have output as same dimension as input image. Softmax activation function is used for classifying each pixel inn certain class.

The entire arhitecture of model is:

Model Architecture


We used IOU and Dice Score as metrics for evaluation:

Area of Overlap = Number of pixels having same class in both predicted segmentation mask and true segmentation mask.

Combined area = Tota pixels in predicted segmentation mask and true segmentation mask.

Area of Union = Combined Area - Area of overlap.

IOU = Area of Overlap / Area of Union.

Dice Score = 2*Area of Overlap / Combined Area.


Segmentation masks


Download Complete Code


No comments yet