This packet involves the use of Pytesseract, a powerful OCR(Optical Character Recognition) tool in python, for the classification of characters in a live frame.
This packet uses two Python modules, cv2 for capturing and reading the live stream/frame, Pytesseract for character recognition and extraction. To continue first install tesseract in a virtual environment.
import pytesseract from pytesseract import Output import cv2 from PIL import Image
Pytesseract Output function is used to get text information related to its alignment(left, right, width, height), accuracy, language, etc. Cv2 is for frame reading, processing, and optimization for easier classification.
Capturing, Reading, and Processing frame:
cap = cv2.VideoCapture(0) while True: # Reading the frame ret,frame = cap.read() # Converting into gray frame gray_frame = cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
The above lines of code capture the live frame in the 'cap' variable and extract it into the 'frame' variable. The frame is converted into gray color for making the frame uniform before feeding it into tesseract for text extraction.
# Extracting the data from frame in form of dictionary frame_data = pytesseract.image_to_data(gray_frame,output_type=Output.DICT)
The above code extracts the text in the 'frame_data' variable. The data is extracted in the dictionary format having left, right, top, width, height, conf, etc as the keys. It contains information regarding whole data in the frame along with its positional coordinates.
Showing text on the frame:
# Setting the coordinates of the text scanned for i in range(len(frame_data['text'])): # x-> corrdinate from left, y-> top coordinate, # w-> width of the text, h-> height of the text x = frame_data['left'][i] y = frame_data['top'][i] w = frame_data['width'][i] h = frame_data['height'][i] accuracy = frame_data['conf'][i] # Showing data only if accuracy is more than 20% # You can also change the accuracy but it highly depends upon the quality of # scanned frame and the data if int(accuracy) > 10: # Setting the text text = frame_data['text'][i] text = "".join([c for c in text]).strip() # Placing the text on the frame cv2.putText(frame,text,(x,y-20),cv2.FONT_HERSHEY_COMPLEX,1,(0,0,255),2,cv2.LINE_AA) # Showing the frame cv2.imshow("Text Frame",frame) if cv2.waitKey(1) & 0xff==ord('q'): break # Releasing the frame and closing frame window cap.release() cv2.destroyAllWindows()
The above code stores the left, top coordinates as well as width and height coordinates for getting the text boundary. Next, the 'accuracy' stores how accurately the text is recognized. If the accuracy is more than 10% then joining the text in the given line and placing it on the frame using the coordinates. You can also change the accuracy but for that, a better optimization technique like threshold, gaussian blur, etc need to be used for getting more saturated and less noise frame.
After the text, the text frame is displayed showing the text present in the captured frame. The program exits only if 'q' is pressed as given by ord('q') otherwise the capturing will continue infinitely.
The frame is released and the frame window is closed. It can be used for doc scanners with high optimization algorithms and better saturation techniques.
Submitted by Rachit R Jindal (rachit99)
Download packets of source code on Coders Packet