Creating a movie recommendation system on the user-like basis

Hey fellas!

Let us create a movie recommendation system based on what the user likes using Python and Pandas library.
We shall process the movie dataset imported online consisting of thousands of movies including the genre, date of release, and other essential features to obtain a set of movies with a score that defines the similarity.
Link to download the movie dataset –> https://files.grouplens.org/datasets/movielens/ml-25m.zip

Follow the steps below to create the program.

Importing the necessary libraries and the dataset

import pandas as pd
movies=pd.read_csv("movies.csv")
movies.head() 
#to view the first 10 rows of the movies dataframe

Cleaning the movie titles using regex

It removes special characters or punctuation from movie titles. We apply this function to all movie titles and store the cleaned titles in a new column.

import re

def clean_title(title):
 title=re.sub("[^a-zA-Z0-9]","",title)
 return title

movies["clean_title"]=movies["title"].apply(clean_title)
movies

Finding the unique terms using TfidVectorizer

TfidfVectorizer converts the cleaned movie titles into numerical vectors based on the Term Frequency – Inverse Document Frequency method. The tfidf matrix represents the similarity of each movie title based on its text context.

from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(ngram_range=(1,2))

tfidf = vectorizer.fit_transform(movies["clean_title"])

Defining the search function

The search function performs multiple tasks starting from the cleaning of the title, transforming it into a TF-IDF vector, and computing the cosine similarity between the input title and movie titles in the dataset.

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def search(title):
    title = clean_title(title)
    query_vec = vectorizer.transform([title])
    similarity = cosine_similarity(query_vec, tfidf).flatten()
    indices = np.argpartition(similarity, -5)[-5:]
    results = movies.iloc[indices].iloc[::-1]
    
    return results

Creating an interactive search box using ipywidgets

An interactive text box where users can type a movie title.

import ipywidgets as widgets
from IPython.display import display

movie_input = widgets.Text(
    value='Toy Story',
    description='Movie Title:',
    disabled=False
)
movie_list = widgets.Output()

def on_type(data):
    with movie_list:
        movie_list.clear_output()
        title = data["new"]
        if len(title) > 5:
            display(search(title))

movie_input.observe(on_type, names='value')
display(movie_input, movie_list)

Loading the “ratings” dataset

It can be downloaded from the zip file mentioned at the beginning.

ratings = pd.read_csv("ratings.csv")
ratings.dtypes

Finding similar movies based on users

This feature locates related films by looking up users who gave the current film high ratings. It starts by locating people who gave the specified movie_id a rating of at least 4. Then, it determines the proportion of users who are similar to them who enjoyed the other films that these users scored highly. It is advised to see the films that have the highest ratio of similarity scores. The user’s preferences inform the personalization of this advice.

def find_similar_movies(movie_id):
    similar_users = ratings[(ratings["movieId"] == movie_id) & (ratings["rating"] > 4)]["userId"].unique()
    similar_user_recs = ratings[(ratings["userId"].isin(similar_users)) & (ratings["rating"] > 4)]["movieId"]
    similar_user_recs = similar_user_recs.value_counts() / len(similar_users)

    similar_user_recs = similar_user_recs[similar_user_recs > .10]
    all_users = ratings[(ratings["movieId"].isin(similar_user_recs.index)) & (ratings["rating"] > 4)]
    all_user_recs = all_users["movieId"].value_counts() / len(all_users["userId"].unique())
    
    rec_percentages = pd.concat([similar_user_recs, all_user_recs], axis=1)
    rec_percentages.columns = ["similar", "all"]
    
    rec_percentages["score"] = rec_percentages["similar"] / rec_percentages["all"]
    rec_percentages = rec_percentages.sort_values("score", ascending=False)
    
    return rec_percentages.head(10).merge(movies, left_index=True, right_on="movieId")[["score", "title", "genres"]]

Setting up movie recommendation interaction

This part integrates the previous search and recommendation logic into an interactive system.

movie_name_input = widgets.Text(
    value='Toy Story',
    description='Movie Title:',
    disabled=False
)
recommendation_list = widgets.Output()

def on_type(data):
    with recommendation_list:
        recommendation_list.clear_output()
        title = data["new"]
        if len(title) > 5:
            results = search(title)
            movie_id = results.iloc[0]["movieId"]
            display(find_similar_movies(movie_id))

movie_name_input.observe(on_type, names='value')
display(movie_name_input, recommendation_list)

Output:

A list of 10 movies will be showcased matching the user preference and movie genre along with the criteria score obtained with the score, title and genres column names.