Text Detection and Extraction from OpenCV and OCR in Python

Text detection and extraction involve finding and reading text from images. In Python, we can use OpenCV for image processing and an OCR(Optical character recognition) tool like Tesseract for reading text.

How to detect text and extract it using OpenCV and OCR in Python.

^{Step 1: Installation}

Install Tesseract OCR
1. Download and install tesseract from https://github.com/tesseract-ocr/tesseract
2. Install Pytesseract by: pip install pytesseract
Install OpenCV: pip install opencv-python

step 2: Example code

#

import cv2
import numpy as np
import pytesseract

# Path to Tesseract executable
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

#Load Image
img = cv2.imread("untitled.png")

#OCR on the full preprocessed image 
text = pytesseract.image_to_string(img)

# Convert image to grayscale
def get_grayscale(image):
    return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

#  Apply thresholding to preprocess the image
def thresholding(image):
    return cv2.threshold(image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

# Noise removal
def remove_noise(image):
    return  cv2.medianBlur(image, 5) 

#Display Results
print(text)
cv2.imshow("Img",img)
cv2.waitKey(0)

#

Output:

Here is the link to the image:https://photos.app.goo.gl/V1BBDG3RJP5ijWZHA

#
don't stare too long
you'll miss the train
#

Step 3: Explanation

Grayscale

image is the original color image loaded using cv2.imread().
cv2.COLOR_BGR2GRAY converts the image from Blue-Green-Red (BGR) to Grayscale.
Thresholding

Thresholding makes the text more distinct, aiding OCR.
255: The maximum value assigned to pixels above the threshold.
cv2.THRESH_BINARY: The type of thresholding
Noise removal

(cv2.medianBlur) is perfect for salt-and-pepper noise.
Text extraction

Tesseract OCR (pytesseract.image_to_string) extracts text from the processed image.

How to detect text and extract it using OpenCV and OCR in Python.

Step 1: Installation

step 2: Example code

Output:

Step 3: Explanation

Grayscale

Thresholding

Noise removal

Text extraction

Related Posts

Leave a Comment Cancel Reply

^{Step 1: Installation}