Text Detection and Extraction from OpenCV and OCR in Python

Text detection and extraction involve finding and reading text from images. In Python, we can use OpenCV for image processing and an OCR(Optical character recognition) tool like Tesseract for reading text.

How to detect text and extract it using OpenCV and OCR in Python.

 

Step 1: Installation

  1. Install Tesseract OCR
    1. Download and install tesseract from https://github.com/tesseract-ocr/tesseract
    2. Install Pytesseract by:   pip install pytesseract
  2. Install OpenCV: pip install opencv-python

step 2: Example code

#

import cv2
import numpy as np
import pytesseract

# Path to Tesseract executable
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

#Load Image
img = cv2.imread("untitled.png")

#OCR on the full preprocessed image 
text = pytesseract.image_to_string(img)

# Convert image to grayscale
def get_grayscale(image):
    return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

#  Apply thresholding to preprocess the image
def thresholding(image):
    return cv2.threshold(image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

# Noise removal
def remove_noise(image):
    return  cv2.medianBlur(image, 5) 

#Display Results
print(text)
cv2.imshow("Img",img)
cv2.waitKey(0)

#

Output:

Here is the link to the image:https://photos.app.goo.gl/V1BBDG3RJP5ijWZHA

#
don't stare too long
you'll miss the train
#

Step 3: Explanation

  • Grayscale

    image is the original color image loaded using cv2.imread().
    cv2.COLOR_BGR2GRAY converts the image from Blue-Green-Red (BGR) to Grayscale.

  • Thresholding

    Thresholding makes the text more distinct, aiding OCR.
    255: The maximum value assigned to pixels above the threshold.
    cv2.THRESH_BINARY: The type of thresholding

  • Noise removal

    (cv2.medianBlur) is perfect for salt-and-pepper noise.

  • Text extraction

    Tesseract OCR (pytesseract.image_to_string) extracts text from the processed image.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top