Identifying hand digit in Python using TensorFlow and OpenCV

Albinus P3/

The hand digit is captured, processed, and predicted using the model trained. Tensorflow and OpenCV are used majorly. project is done in python

This is my version of the python code predicting the hand digit shown. It allows the user to capture a picture of the sign shown and have the program take a guess of which digit it is from the trained model. This uses a convolution neural network model. Dataset of 12,713 images was used to train the model.

Stage 1: Training the model

A dataset of 12,713 images was downloaded and split into 9081 images for training and 3632 images for the test. The dataset was loaded using ‘Flow_from_directory’ in the ImageDataGenerator. A convolution neural network of sequential type with 7 layers was formed. and softmax activation is used In the last layer since the prediction involves {None, one, two, three, five}. the training images were of size 300 x 300.

#1^st layer

Conv2D(16, (3, 3), input_shape=(300,300,1),activation='relu')

MaxPooling2D(pool_size=(2, 2))

#2^nd layer

Conv2D(32, (3,3), activation='relu')

l4=MaxPooling2D(pool_size=(2, 2))

3^rd layer

Conv2D(64, (3,3), activation='relu')

MaxPooling2D((2,2))

4^th layer

lConv2D(64, (3,3), activation='relu')

MaxPooling2D((2,2))

#5^th layer

Conv2D(64, (3,3), activation='relu')

MaxPooling2D((2,2))

#6^th layer

Flatten()

Dense(512, activation='relu')

#last layer 7

Dense(6, activation='softmax')

The model was trained using the fit_generator function for 40 epochs. model is trained for accuracy of 80%. (and saved as OH-HO2.h5)

training

Stage 2: capturing the image and processing it.

First, webcam = cv2.VideoCapture(0) is initialized and the webcam.read() is called within a while loop to give an output video. Click ‘s’ to save or click ‘q’ to exit. For sake of simplicity, the image is saved when ‘s’ is pressed.

Second, the saved image is read. we apply Gaussian blur to the image. Note that kernel sizes must be positive and odd and the kernel must be square. Then we use an adaptive threshold using 11 nearest neighbor pixels.

---- Capturing the image ----

key = cv2. waitKey(1)

webcam = cv2.VideoCapture(0)

while True:

try:

check, frame = webcam.read()

cv2.imshow("Capturing", frame)

key = cv2.waitKey(1)

if key == ord('s'):

cv2.imwrite(filename='saved_img.PNG', img=frame)

webcam.release()

img_new = cv2.imread('saved_img.PNG', cv2.IMREAD_GRAYSCALE)

img_new = cv2.imshow("Captured Image", img_new)

cv2.waitKey(1650)

cv2.destroyAllWindows()

print("Processing image...")

break

elif key == ord('q'):

print("Turning off camera.")

webcam.release()

print("Camera off.")

print("Program ended.")

cv2.destroyAllWindows()

break

except(KeyboardInterrupt):

print("Turning off camera.")

webcam.release()

print("Camera off.")

print("Program ended.")

cv2.destroyAllWindows()

break

captured

----- processing the image ----

img_ = cv2.imread('saved_img.PNG', cv2.IMREAD_GRAYSCALE)

dst = cv2.GaussianBlur(img_.copy(),(9,9),0)

process = cv2.adaptiveThreshold(dst, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)

process = cv2.bitwise_not(process, process)

kernel = np.array([[0., 1., 0.], [1., 1., 1.], [0., 1., 0.]], np.uint8)

process = cv2.dilate(process, kernel)

processed

Stage 3: predicting the image

The model is loaded using load_model from tensorflow.keras.models. The processed image is resized to 300 x 300 and then passed to predict function. The prediction is printed.

img=cv2.resize(process, (300, 300),interpolation = cv2.INTER_AREA)

model = load_model('OH-HO2.h5')

pred = classes[np.argmax(model.predict(img.reshape(1, 300, 300, 1), batch_size=1))]

Coders Packet

Identifying hand digit in Python using TensorFlow and OpenCV

Comments