Coders Packet

Text recognition and extraction from an image using Python pytesseract.

By Chereddy Vivek Reddy

In this project, we will learn how to recognize and extract text from the given image using Python. We use OpenCV to load the image and pytesseract module to extract text.

Description:

  • The Python code in the packet is used to recognize and extract text from an image.
  • We use pytesseract module to recognize and extract text from an image and we use opencv to load the image.
  • The extracted text is saved as a .txt file.

Installing required libraries:

  • Steps to install opencv

        Go to the command prompt and use ''pip install opencv-python'' command to install OpenCV.

         

      Follow this link to install python and OpenCV.

  • Steps to install pytesseract

      Go to this link and download tesseract module suitable to your system. 

      After downloading pytesseract setup file, run it

      

 

      Follow this video to install pytesseract, you have to add the path of pytesseract file to your system environment.

 

        

Input Image:

  • This image is given as input.

Code:

  • importing required libraries

     

#importing required libraries (opencv and pytesseract)
import cv2
import pytesseract
  • loading input image

 

#reading image
img = cv2.imread('img.jpg')
  • pytesseract.image_to_string is used to detects the image and extracts text and saved as a string

 

#extracting text in the image
text = pytesseract.image_to_string(img, lang='eng')
  • a file name "recognized_text" is created to export the recognized text

 

#creating a file with name recognized
file = open("recognized_text.txt", "w+")
file.close()
  • The text in the saved string is appended to the file created.

 

#Appending the extracted text into the file
file = open("recognized_text.txt","a")
file.write(text)
file.close()

 

Output:

  • A text file is created and when we print it the extracted text from the image is printed.

 

 

Download project

Reviews Report

Submitted by Chereddy Vivek Reddy (Vivek)

Download packets of source code on Coders Packet