Scraping IMDB data with Python and BeautiulSoup

This is a Python project to scrap data from IMDB site, about top rates movies, popular movies or any other pages having similar template. It uses a python module called BeautifulSoup.

In this project, we will be scraping data from the internet, to be precise, from IMDB, one of the popular movie rating webistes. There are lots of pages in IMDB, with a lot of content, and sometimes collecting all that data and storing that, can be beneficial. Every IMDB page has a fixed template. This scraper will work for all the pages having template similar to:

The scraper extracts data like, the movie name, the poster image link, the year of release and the rating of the movie.

The project relies on the python module BeautifulSoup which provides a lot of functionalites to parse HTML pages and extract data from it. After extracting the data, it can be saved in JSON or CSV format, depending on user choice.

Using the project: To use the project, open a terminal in the directory and run the command python3 scrape_data.py in the terminal. It will ask the url of the site to be scraped. For example - https://www.imdb.com/chart/moviemeter/ (Popular Movies)

Then, enter the filename according to the instructions displayed on the terminal. File can stored in either JSON or CSV format. The file will be created and the data will be stored successfully, if scraped.

Scraping IMDB data with Python and BeautiulSoup

Project Files

Comments (0)

Leave a Comment

Rating

Author