In this tutorial, we shall learn how to automatically detect an image on your screen using Python. We’ll manage to create an efficient means of searching for any image on the screen and retrieve its location using PyAutoGUI for capturing screenshots and OpenCV for image processing and template matching.
Image detection on the screen
Key Libraries and Tools:
- PyAutoGUI : The major application is to automate mouse and keyboard actions, and to capture images. Here in the given code, it has captured the current screen as an image.
- OpenCVÂ : It is one of the powerful libraries for computer vision tasks. This code was used for loading, processing, and having template match to search for an image on the screen.
- NumPy: This library is primarily used for the manipulation of arrays along with general operations with image data in a very efficient manner. In this code snippet, it is utilized to transform the screenshot into an array that OpenCV can deal with.
NumPy is short for Numerical Python. It’s a library to handle large, multi-dimensional arrays and matrices alongside a large collection of high-level mathematical functions to operate on these arrays. It is an all-purpose extensible library for the Python programming language that underlies most scientific computing in Python and has comprehensive applications for scientific computing, data analysis, machine learning, and image processing.
OpenCV is a widely used open-source library for computer vision and image processing and applies machine learning. It is broadly used in robotics, artificial intelligence, video analysis, and image processing. The tools as well as algorithms in OpenCV enable developers to manipulate images, videos, or real-time data for purposes such as object detection, image segmentation, and motion tracking.
Python code
import pyautogui import cv2 import numpy as np def locate_image_on_screen(image_path, confidence=0.7): # Lower confidence level # Capture the screen screen = pyautogui.screenshot() # Convert the screenshot to a numpy array screen_np = np.array(screen) # Convert RGB to BGR (which OpenCV uses) screen_np = cv2.cvtColor(screen_np, cv2.COLOR_RGB2BGR) # Load the template image template = cv2.imread(image_path) # Debugging: Check if the image is loaded and print its dimensions if template is None: print("Error: Could not load image. Check the file path and image format.") return None else: print(f"Template image loaded with dimensions: {template.shape}") # Print screen dimensions for debugging print(f"Screen dimensions: {screen_np.shape}") # Resize the template if necessary desired_width, desired_height = 188, 180 # Adjust these values to match the size on your screen template = cv2.resize(template, (desired_width, desired_height)) # Perform template matching result = cv2.matchTemplate(screen_np, template, cv2.TM_CCOEFF_NORMED) # Get the location of the best match with the given confidence level min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result) # Debugging: Print matching values print(f"Max value: {max_val}, Max location: {max_loc}") if max_val >= confidence: # Calculate the center of the detected region h, w, _ = template.shape center_x = max_loc[0] + w // 2 center_y = max_loc[1] + h // 2 # Draw rectangle around the match for visualization top_left = max_loc bottom_right = (top_left[0] + w, top_left[1] + h) cv2.rectangle(screen_np, top_left, bottom_right, (0, 255, 0), 2) # Show the result cv2.imshow('Detected', screen_np) cv2.waitKey(0) # Press any key to close the window cv2.destroyAllWindows() return (center_x, center_y) else: print("Image not found on the screen.") return None # Example usage image_path = r"C:\Users\pavan\OneDrive\Desktop\image scanning\Screenshot 2024-05-05 192559.png" # Use the correct absolute path to the image location = locate_image_on_screen(image_path) if location: print(f"Image found at location: {location}") else: print("Image not found on the screen.")
Output
Template image loaded with dimensions: (180, 188, 3) Screen dimensions: (1080, 1920, 3) Max value: 1.0, Max location: (861, 452)
How code works
- Screen Capture: We capture the current window screen using pyautogui.screenshot(). This capture opens up as a numpy array, which is then processed using OpenCV.
- Image Loading: Load the image we want to find on screen by using cv2.imread(). If an image does not load successfully-probably because of a wrong file path or an unsupported format-the function will print out an error message and terminate.
- Template Matching: OpenCV has a strong feature called template matching. It uses a small image, known as a template, to compare it with the other image, screen capture in order to find a most similar region. In this code, we make use of the function cv2.matchTemplate() for performing the match.
- Confidence Threshold: Once matched, the code retrieves the best match and inspects its confidence level. Provided the confidence exceeds the given threshold value now by default set to 0.7, we may assume the image is seen within the screen.
- Rectangle and Visual Feedback: For proper graphical verification, it simply draws a rectangle on the detected area on the screen and shows the result using OpenCV’s cv2.imshow() function.
- Center Coordinates: There is a calculation of the coordinates of the center for the matched region as a result of a successful match and returned. It could be an object to click on the image with PyAutoGUI.