By Mihir Shri
This project is about getting the Top 10 movies of all time based on IMDB ratings using Python. The ratings have been normalized to take into account the number of votes received by a movie.
This project gets you a list of the Top 10 movies of all time based on their IMDB ratings with Python programming.
The problem:
Any user can go to the IMDB website, check ratings for a movie and watch that movie, right? Yes, of course he can but there's a catch. The ratings of a movie change as it receives more and more number of ratings. Let us understand the problem with the help of an example.
Suppose a certain movie (movie1) has been rated by 5 different users and has received a rating of 5 from all of them. So, average rating of movie1 is 5. Now, suppose there is another movie (movie2) which has been rated by 10,000 different users and the average rating for movie2 is 4.7. So, any user will prefer to watch movie2 over movie1 because movie2 has been watched and liked by more number of people as compared to movie1. There is a possibility that the 5 users who rated movie1 might have a similar taste in movies but you might have a completely different tase. But, the probability of that happening with 10,000 users is very less.
Python libraries Used:
1. pandas
2. numpy
The csv file "movies_metadata.csv" is read into the code using the read_csv('filename') function of the famous pandas library. The function takes in the filename as an input and converts the contents of the csv file into a DataFrame.
The following formula is used to solve the problem described above:
(v * R / (v + m)) + (m * C / (v + m))
where,
This formula has been applied to generate a score for each movie and the movies are then sorted in descending order of their scores.
Submitted by Mihir Shri (coe18b064)
Download packets of source code on Coders Packet
Comments