Introduction about the convolution Neural Network(CNN)

            What is Convolution Neural Network

A Convolutional Neural Network (CNN) is a type of Deep Learning neural network architecture commonly used in Computer Vision. Computer vision is a field of Artificial Intelligence that enables a computer to understand and interpret the image or visual data.

When it comes to Machine Learning, Artificial Neural Networks perform really well. Neural Networks are used in various datasets like images, audio, and text. Different types of Neural Networks are used for different purposes, for example for predicting the sequence of words we use Recurrent Neural Networks more precisely an LSTM, similarly for image classification we use Convolution Neural networks. In this blog, we are going to build a basic building block for CNN.

In a regular Neural Network there are three types of layers:

  1. Input layers: It’s the layer in which we give input to our model. The number of neurons in this layer is equal to the total number of features in our data (number of pixels in the case of an image).
  2. Hidden Layer: The input from the Input layer is then fed into the hidden layer. There can be many hidden layers depending on our model and data size. Each hidden layer can have different numbers of neurons which are generally greater than the number of features. The output from each layer is computed by matrix multiplication of the output of the previous layer with learnable weights of that layer and then by the addition of learnable biases followed by activation function which makes the network nonlinear.

3.Output Layer: The output from the hidden layer is then fed into a logistic function like                   sigmoid or softmax which converts the output of each class into the probability score of                 each class.                                                                                                                                                                                                                   CNN architecture      

Convolutional Neural Network consists of multiple layers like the input layer, C   C                         Convolutional layer, Pooling layer, and fully connected layers.

 

The Convolutional layer applies filters to the input image to extract features, the Pooling               layer downsamples the image to reduce computation, and the fully connected layer makes            the final prediction. The network learns the optimal filters through backpropagation and               gradient descent.

Here is an example to show how CNN recognizes an image :

As we can see that the above diagram ,we can see that those values are lit that have a value           of 1.

            Layers in a Convolutional Neural Network

 

A convolution neural network has multiple hidden layers that help in extracting                              information from an image. The four important layers in CNN are:

  1. Convolution layer
  2. ReLU layer
  3. Pooling layer
  4. Fully connected layer

CONVOLUTION LAYER

 

        This is the first step in the process of extracting valuable features from an image. A                          convolution layer has several filters that perform the convolution operation. Every image is          considered as a matrix of pixel values.

 

 

 

        Consider the following 5×5 image whose pixel values are either 0 or 1. There’s also a filter              matrix with a dimension of 3×3. Slide the filter matrix over the image and compute the dot            product to get the convolved feature matrix.

  ReLU layer
ReLU stands for the rectified linear unit. Once the feature maps are extracted, the next step          is to move them to a ReLU layer.

ReLU performs an element-wise operation and sets all the negative pixels to 0. It                            introduces non-linearity to the network, and the generated output is a rectified feature                  map. Below is the graph of a ReLU function:

 

The original image is scanned with the multiple convolutions and ReLU layers for                            locating   the features:

1.Import Libraries:

 

import tensorflow as tf
from tensorflow.keras

import datasets, layers, models

import matplotlib.pyplot as plt

import numpy as np

2.Load the Data set:

(X_train, y_train), (X_test,y_test) = datasets.cifar10.load_data()

X_train.shape

Output:

(50000, 32, 32, 3)

X_test.shape

Output:

(10000, 32, 32, 3)

y_train.shape

Output:

(50000, 1)

y_train[:5]

Output:

array([[6],
       [9],
       [9],
       [4],
       [1]], dtype=uint8)

y_train is a 2D array, for our classification having 1D array is good enough.
so we will convert this to now 1D array.

y_train = y_train.reshape(-1,)
y_train[:5] Output: array([6, 9, 9, 4, 1], dtype=uint8) y_test = y_test.reshape(-1,) classes = ["airplane","automobile","bird","cat","deer","dog","frog","horse","ship","truck"] Let's plot some Images: def plot_sample(X, y, index):
    plt.figure(figsize = (15,2))
    plt.imshow(X[index])
    plt.xlabel(classes[y[index]]) plot_sample(X_train, y_train, 0) Output:  plot_sample(X_train, y_train, 1) Output: 

    Normalizing the training data

 

X_train = X_train / 255.0

X_test = X_test / 255.0

Build simple artificial neural network for image classification

 

ann = models.Sequential([
        layers.Flatten(input_shape=(32,32,3)),
        layers.Dense(3000, activation='relu'),
        layers.Dense(1000, activation='relu'),
        layers.Dense(10, activation='softmax')    
    ])

ann.compile(optimizer='SGD',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

ann.fit(X_train, y_train, epochs=5)

Output:

We can see that the accuracy around 49%.
from sklearn.metrics import confusion_matrix , classification_report
import numpy as np
y_pred = ann.predict(X_test)
y_pred_classes = [np.argmax(element) for element in y_pred]

print("Classification Report: \n", classification_report(y_test, y_pred_classes))

Out [ ]:

Classification Report: 
               precision    recall  f1-score   support

           0       0.63      0.45      0.53      1000
           1       0.72      0.46      0.56      1000
           2       0.33      0.46      0.39      1000
           3       0.36      0.25      0.29      1000
           4       0.44      0.37      0.40      1000
           5       0.34      0.46      0.39      1000
           6       0.56      0.47      0.51      1000
           7       0.39      0.67      0.50      1000
           8       0.64      0.60      0.62      1000
           9       0.59      0.53      0.55      1000

    accuracy                           0.47     10000
   macro avg       0.50      0.47      0.47     10000
weighted avg       0.50      0.47      0.47     10000

Now let us build a convolutional neural network to train Our images

cnn = models.Sequential([
    layers.Conv2D(filters=32, kernel_size=(3, 3), activation='relu', input_shape=(32, 32, 3)),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])




cnn.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])




cnn.fit(X_train, y_train, epochs=10)

Out[ ]:

cnn.evaluate(X_test,y_test)

Out [ ]:

[0.9021560549736023, 0.7027999758720398]

y_pred = cnn.predict(X_test)

y_pred[:5]

Out [ ]:

array([[4.3996371e-04, 3.4844263e-05, 1.5558505e-03, 8.8400185e-01,
        1.9452239e-04, 3.5314459e-02, 7.2777577e-02, 6.9044131e-06,
        5.6417785e-03, 3.2224660e-05],
       [8.1062522e-03, 5.0841425e-02, 1.2453231e-07, 5.3348430e-07,
        9.1728407e-07, 1.0009186e-08, 2.8985988e-07, 1.7532484e-09,
        9.4089705e-01, 1.5346886e-04],
       [1.7055811e-02, 1.1841061e-01, 4.6799007e-05, 2.7727904e-02,
        1.0848254e-03, 1.0896578e-03, 1.3575243e-04, 2.8652203e-04,
        7.8895986e-01, 4.5202184e-02],
       [3.1300801e-01, 1.1591638e-02, 1.1511055e-02, 3.9592334e-03,
        7.7280165e-03, 5.6289224e-05, 2.3531138e-04, 9.4204297e-06,
        6.5178138e-01, 1.1968113e-04],
       [1.3230885e-05, 2.1221960e-05, 9.2594400e-02, 3.3585075e-02,
        4.4722903e-01, 4.1028224e-03, 4.2241842e-01, 2.8064171e-05,
        6.6392668e-06, 1.0745022e-06]], dtype=float32)
 
  y_classes = [np.argmax(element) for element in y_pred]
    y_classes[:5] Out[]: [3, 8, 8, 8, 4] y_test[:5] Out[]: array([3, 8, 8, 0, 6], dtype=uint8) plot_sample(X_test, y_test,3) Out[]:  classes[y_classes[3]] Out[]: 'ship' classes[y_classes[3]] Out[]: 'ship'
  Advantages of Convolutional Neural Networks (CNNs):
  1. Good at detecting patterns and features in images, videos, and audio signals.
  2. Robust to translation, rotation, and scaling invariance.
  3. End-to-end training, no need for manual feature extraction.
  4. Can handle large amounts of data and achieve high accuracy.

 

  Disadvantages of Convolutional Neural Networks (CNNs):
  1. Computationally expensive to train and require a lot of memory.
  2. Can be prone to overfitting if not enough data or proper regularization is used.
  3. Requires large amounts of labeled data.
  4. Interpretability is limited, it’s hard to understand what the network has learned.

 

 

 

 

 

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top