Text recognition and extraction from an image using Python pytesseract.

text_recog.py

In this project, we will learn how to recognize and extract text from the given image using Python. We use OpenCV to load the image and pytesseract module to extract text.

Description:

The Python code in the packet is used to recognize and extract text from an image.
We use pytesseract module to recognize and extract text from an image and we use opencv to load the image.
The extracted text is saved as a .txt file.

Installing required libraries:

Steps to install opencv

Go to the command prompt and use ''pip install opencv-python'' command to install OpenCV.

Follow this link to install python and OpenCV.

Steps to install pytesseract

Go to this link and download tesseract module suitable to your system.

After downloading pytesseract setup file, run it

Follow this video to install pytesseract, you have to add the path of pytesseract file to your system environment.

Input Image:

This image is given as input.

Code:

importing required libraries

#importing required libraries (opencv and pytesseract)
import cv2
import pytesseract

loading input image

#reading image
img = cv2.imread('img.jpg')

pytesseract.image_to_string is used to detects the image and extracts text and saved as a string

#extracting text in the image
text = pytesseract.image_to_string(img, lang='eng')

a file name "recognized_text" is created to export the recognized text

#creating a file with name recognized
file = open("recognized_text.txt", "w+")
file.close()

The text in the saved string is appended to the file created.

#Appending the extracted text into the file
file = open("recognized_text.txt","a")
file.write(text)
file.close()

Output:

A text file is created and when we print it the extracted text from the image is printed.

Coders Packet