Introduction about the convolution Neural Network(CNN)

What is Convolution Neural Network

A Convolutional Neural Network (CNN) is a type of Deep Learning neural network architecture commonly used in Computer Vision. Computer vision is a field of Artificial Intelligence that enables a computer to understand and interpret the image or visual data.

When it comes to Machine Learning, Artificial Neural Networks perform really well. Neural Networks are used in various datasets like images, audio, and text. Different types of Neural Networks are used for different purposes, for example for predicting the sequence of words we use Recurrent Neural Networks more precisely an LSTM, similarly for image classification we use Convolution Neural networks. In this blog, we are going to build a basic building block for CNN.

In a regular Neural Network there are three types of layers:

Input layers: It’s the layer in which we give input to our model. The number of neurons in this layer is equal to the total number of features in our data (number of pixels in the case of an image).
Hidden Layer: The input from the Input layer is then fed into the hidden layer. There can be many hidden layers depending on our model and data size. Each hidden layer can have different numbers of neurons which are generally greater than the number of features. The output from each layer is computed by matrix multiplication of the output of the previous layer with learnable weights of that layer and then by the addition of learnable biases followed by activation function which makes the network nonlinear.

3.Output Layer: The output from the hidden layer is then fed into a logistic function like sigmoid or softmax which converts the output of each class into the probability score of each class. CNN architecture

Convolutional Neural Network consists of multiple layers like the input layer, C C Convolutional layer, Pooling layer, and fully connected layers.

The Convolutional layer applies filters to the input image to extract features, the Pooling layer downsamples the image to reduce computation, and the fully connected layer makes the final prediction. The network learns the optimal filters through backpropagation and gradient descent.

Here is an example to show how CNN recognizes an image :

As we can see that the above diagram ,we can see that those values are lit that have a value of 1.

Layers in a Convolutional Neural Network

A convolution neural network has multiple hidden layers that help in extracting information from an image. The four important layers in CNN are:

Convolution layer
ReLU layer
Pooling layer
Fully connected layer

CONVOLUTION LAYER

This is the first step in the process of extracting valuable features from an image. A convolution layer has several filters that perform the convolution operation. Every image is considered as a matrix of pixel values.

Consider the following 5×5 image whose pixel values are either 0 or 1. There’s also a filter matrix with a dimension of 3×3. Slide the filter matrix over the image and compute the dot product to get the convolved feature matrix.

ReLU layer
ReLU stands for the rectified linear unit. Once the feature maps are extracted, the next step is to move them to a ReLU layer.

ReLU performs an element-wise operation and sets all the negative pixels to 0. It introduces non-linearity to the network, and the generated output is a rectified feature map. Below is the graph of a ReLU function:

The original image is scanned with the multiple convolutions and ReLU layers for locating the features:

1.Import Libraries:

import tensorflow as tf from tensorflow.keras

import datasets, layers, models

import matplotlib.pyplot as plt

import numpy as np

2.Load the Data set:

(X_train, y_train), (X_test,y_test) = datasets.cifar10.load_data()

X_train.shape

Output:

(50000, 32, 32, 3)

X_test.shape

Output:

(10000, 32, 32, 3)

y_train.shape

Output:

(50000, 1)

y_train[:5]

Output:

array([[6],
       [9],
       [9],
       [4],
       [1]], dtype=uint8)

y_train is a 2D array, for our classification having 1D array is good enough.
so we will convert this to now 1D array.

y_train = y_train.reshape(-1,)
y_train[:5] Output: array([6, 9, 9, 4, 1], dtype=uint8) y_test = y_test.reshape(-1,) classes = ["airplane","automobile","bird","cat","deer","dog","frog","horse","ship","truck"] Let's plot some Images: def plot_sample(X, y, index):
    plt.figure(figsize = (15,2))
    plt.imshow(X[index])
    plt.xlabel(classes[y[index]]) plot_sample(X_train, y_train, 0) Output:  plot_sample(X_train, y_train, 1) Output:

Normalizing the training data

X_train = X_train / 255.0

X_test = X_test / 255.0

Build simple artificial neural network for image classification

ann = models.Sequential([
        layers.Flatten(input_shape=(32,32,3)),
        layers.Dense(3000, activation='relu'),
        layers.Dense(1000, activation='relu'),
        layers.Dense(10, activation='softmax')    
    ])

ann.compile(optimizer='SGD',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

ann.fit(X_train, y_train, epochs=5)

Output:

Epoch 1/5
1563/1563 [==============================] - 2s 2ms/step - loss: 1.8074 - accuracy: 0.3561
Epoch 2/5
1563/1563 [==============================] - 2s 1ms/step - loss: 1.6208 - accuracy: 0.4285
Epoch 3/5
1563/1563 [==============================] - 2s 2ms/step - loss: 1.5380 - accuracy: 0.4585
Epoch 4/5
1563/1563 [==============================] - 2s 2ms/step - loss: 1.4808 - accuracy: 0.4806
Epoch 5/5
1563/1563 [==============================] - 2s 2ms/step - loss: 1.4326 - accuracy: 0.4928

Out[ ]:

<tensorflow.python.keras.callbacks.History at 0x295ab873c10>

We can see that the accuracy around 49%.

from sklearn.metrics import confusion_matrix , classification_report
import numpy as np
y_pred = ann.predict(X_test)
y_pred_classes = [np.argmax(element) for element in y_pred]

print("Classification Report: \n", classification_report(y_test, y_pred_classes))

Out [ ]:

Classification Report: 
               precision    recall  f1-score   support

           0       0.63      0.45      0.53      1000
           1       0.72      0.46      0.56      1000
           2       0.33      0.46      0.39      1000
           3       0.36      0.25      0.29      1000
           4       0.44      0.37      0.40      1000
           5       0.34      0.46      0.39      1000
           6       0.56      0.47      0.51      1000
           7       0.39      0.67      0.50      1000
           8       0.64      0.60      0.62      1000
           9       0.59      0.53      0.55      1000

    accuracy                           0.47     10000
   macro avg       0.50      0.47      0.47     10000
weighted avg       0.50      0.47      0.47     10000

Now let us build a convolutional neural network to train Our images

cnn = models.Sequential([
    layers.Conv2D(filters=32, kernel_size=(3, 3), activation='relu', input_shape=(32, 32, 3)),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])




cnn.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])




cnn.fit(X_train, y_train, epochs=10)

Out[ ]:

Epoch 1/10
1563/1563 [==============================] - 2s 2ms/step - loss: 1.4407 - accuracy: 0.4810
Epoch 2/10
1563/1563 [==============================] - 2s 2ms/step - loss: 1.1084 - accuracy: 0.6109
Epoch 3/10
1563/1563 [==============================] - 2s 2ms/step - loss: 0.9895 - accuracy: 0.6574
Epoch 4/10
1563/1563 [==============================] - 2s 2ms/step - loss: 0.9071 - accuracy: 0.6870
Epoch 5/10
1563/1563 [==============================] - 2s 2ms/step - loss: 0.8416 - accuracy: 0.7097
Epoch 6/10
1563/1563 [==============================] - 2s 2ms/step - loss: 0.7847 - accuracy: 0.7262
Epoch 7/10
1563/1563 [==============================] - 2s 2ms/step - loss: 0.7350 - accuracy: 0.7448
Epoch 8/10
1563/1563 [==============================] - 2s 2ms/step - loss: 0.6941 - accuracy: 0.7574
Epoch 9/10
1563/1563 [==============================] - 2s 1ms/step - loss: 0.6516 - accuracy: 0.7731
Epoch 10/10
1563/1563 [==============================] - 2s 2ms/step - loss: 0.6187 - accuracy: 0.7836

With CNN, at the end 5 epochs, accuracy was at around 70% which is a significant improvement over ANN. CNN’s are best for image classification and gives superb accuracy. Also computation is much less compared to simple ANN as maxpooling reduces the image dimensions while still preserving the features

cnn.evaluate(X_test,y_test)

Out [ ]:

[0.9021560549736023, 0.7027999758720398]

y_pred = cnn.predict(X_test)

y_pred[:5]

Out [ ]:

array([[4.3996371e-04, 3.4844263e-05, 1.5558505e-03, 8.8400185e-01,
        1.9452239e-04, 3.5314459e-02, 7.2777577e-02, 6.9044131e-06,
        5.6417785e-03, 3.2224660e-05],
       [8.1062522e-03, 5.0841425e-02, 1.2453231e-07, 5.3348430e-07,
        9.1728407e-07, 1.0009186e-08, 2.8985988e-07, 1.7532484e-09,
        9.4089705e-01, 1.5346886e-04],
       [1.7055811e-02, 1.1841061e-01, 4.6799007e-05, 2.7727904e-02,
        1.0848254e-03, 1.0896578e-03, 1.3575243e-04, 2.8652203e-04,
        7.8895986e-01, 4.5202184e-02],
       [3.1300801e-01, 1.1591638e-02, 1.1511055e-02, 3.9592334e-03,
        7.7280165e-03, 5.6289224e-05, 2.3531138e-04, 9.4204297e-06,
        6.5178138e-01, 1.1968113e-04],
       [1.3230885e-05, 2.1221960e-05, 9.2594400e-02, 3.3585075e-02,
        4.4722903e-01, 4.1028224e-03, 4.2241842e-01, 2.8064171e-05,
        6.6392668e-06, 1.0745022e-06]], dtype=float32)
 
  y_classes = [np.argmax(element) for element in y_pred]
    y_classes[:5] Out[]: [3, 8, 8, 8, 4] y_test[:5] Out[]: array([3, 8, 8, 0, 6], dtype=uint8) plot_sample(X_test, y_test,3) Out[]:  classes[y_classes[3]] Out[]: 'ship' classes[y_classes[3]] Out[]: 'ship'

Advantages of Convolutional Neural Networks (CNNs):

Good at detecting patterns and features in images, videos, and audio signals.
Robust to translation, rotation, and scaling invariance.
End-to-end training, no need for manual feature extraction.
Can handle large amounts of data and achieve high accuracy.

Disadvantages of Convolutional Neural Networks (CNNs):

Computationally expensive to train and require a lot of memory.
Can be prone to overfitting if not enough data or proper regularization is used.
Requires large amounts of labeled data.
Interpretability is limited, it’s hard to understand what the network has learned.

Layers in a Convolutional Neural Network

Normalizing the training data

Build simple artificial neural network for image classification

Now let us build a convolutional neural network to train Our images

Advantages of Convolutional Neural Networks (CNNs):

Disadvantages of Convolutional Neural Networks (CNNs):

Related Posts

Leave a Comment Cancel Reply