Join me in curating a Python code to fetch the URL of any company irrespective of the spelling mistakes committed while entering the company’s name. The name entered may be misspelled but should be a close match for the program to return accurate results. In this tutorial, we shall use two libraries of utmost importance BeautifulSoup and requests imported from bs4.
Pre-requisite knowledge required: Fundamentals of Python and Sheer Curiosity to implement the code all by yourself.
Let’s get started.
Importing the libraries
Beautiful Soup is a popular Python library used for web scrapping purposes to fetch meaningful and relevant data required by a user.
requests is used to help us by sending HTTP requests to interact with the web-servers.
Code to import both the libraries:
from bs4 import BeautifulSoup import requests
Creating the function
- We shall now create a function which will take the string input from the user and further process it to send a GET request to Google with the query.
- After which it parses the returned HTML to check if Google suggests a corrected spelling in case of a misspelled name entered.
- It returns the first search URL link if the entered name is spelled correctly or the URL of the corrected search if a suggestion is found. Href tags are parsed to obtain the required URLs.
Code to implement the above-mentioned steps:
def search_company(search_query): url = 'https://www.google.com/search' headers = { 'Accept' : '*/*', 'Accept-Language': 'en-US,en;q=0.5', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:92.0) Gecko/20100101 Firefox/92.0', } parameters = {'q': search_query} content = requests.get(url, headers=headers, params=parameters).text soup = BeautifulSoup(content, 'html.parser') correction_tag = soup.find('a', string=lambda text: text and "Showing results for" in text) if correction_tag: corrected_url = correction_tag['href'] print(f"Google suggests: {correction_tag.get_text()}") return "https://www.google.com" + corrected_url search_results = soup.find(id='search') if search_results: first_link = search_results.find('a') if first_link: return first_link['href'] return None
We are including the dictionary named “headers” to tackle the problem of being identified as a bot.
Obtaining the input from the user
Store the input entered by the user as company_name.
Code:
company_name=input("Enter the name")
Calling the function
Create a variable named “url” to store the output obtained after calling the function created earlier.
Code:
url=search_company(company_name)
Final call
Use a basic if-else statement code to print the URL of the company if spotted.
Code:
if url: print("Company URL:", url) else: print("No results found.")
Example:
Company name to be entered = Yaho
Output:
Enter the company's name: Yaho Company URL: https://in.yahoo.com/