Extract Text From an Image Using Python
Welcome! In this tutorial, we will explore how to extract text from images using Python. Sounds interesting, right? You can easily do this using Optical Character Recognition (OCR).In this tutorial, We will extract text from images using Python, Pillow , and Tesseract OCR engine.
Step 1: Install Python Libraries
- Pillow: A Python Imaging Library that provides image processing capabilities.
- Pytesseract: A Python wrapper for Google’s Tesseract-OCR Engine.
- Tesseract-OCR: The actual OCR engine
pip install Pillow pytesseract
Step 2 : Install Tesseract-OCR
- For Windows:
- Download the Tesseract installer from UB Mannheim’s website.
- Run the installer and complete the installation.)
- Add Tesseract to your system’s path variable (eg. C:\Program Files\Tesseract-OCR)
- For macOS (using Homebrew):
brew install tesseract
- For Linux (Ubuntu)
Now,Verify the Tesseract Installation
To ensure Tesseract-OCR is installed correctly, run the following command in your terminal. you will see a version if installation is successful.
tesseract --version
Step 3: Write the Python Script/Code
Now let’s write a Python script to load an image and extract its text.
3.1 Import Required Libraries
3.3 Load the Image
3.5 Print the Extracted Text
3.6 Full Code Example
Here’s the complete script:
# Import necessary libraries from PIL import Image import pytesseract # Set the Tesseract path for Windows ( comment this line if using other operating systems ) pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' # Load the image image_path = 'path_to_your_image.jpg' # Replace with your image file path image = Image.open(image_path) # Perform OCR i.e. extract text from image extracted_text = pytesseract.image_to_string(image) # Print the result print("Extracted Text:\n", extracted_text)
Step 4: Run Your Script
Save the script as extract_text_from_img.py and run it in your terminal or command prompt:
- Navigate to script directory
cd path\to\your\script
- Run the script using the following terminal command
python extract_text_from_img.py