Coders Packet

Book Recommender System using NLP

By Srinivasan

In this project, a content based book recommender system is made using NLP in Python.

Overview and Explanation of the Project

In this project, a content based book recommender system is built using NLP in Python. The plot of the book and the author of the book are taken into consideration for recommending books to the user. Books that have similar plots/authors are identified and recommended. The recommender system takes the name of a book that the user likes as input and gives two different book recommendations as output.

Brief explanation about the data and the dataset used - 

The dataset used here is the CMU book summary dataset. This dataset has 16,559 rows and 7 features.

The data consists of the following columns - 

1. Wikipedia article ID
2. Freebase ID
3. Title of the book
4. Author of the book
5. Publication Date
6. Genre of the book
7. Summary of the plot

Necessary Dependencies for the project - 

The project has been implemented in Python and the dependencies necessary for the project include Pandas, Numpy, NLTK, and Scikit-Learn.

Workflow of the project consists of the following stages - 

1. Loading and exploring the dataset - It is essential to understand and explore the dataset before performing any manipulations on it. This allows for any necessary manipulation to be more precise and accurate and allows us to understand what happens to the data when we perform different actions on it.

2. Preprocessing - This is done in order to clean the data and make it more useful.

The wikipedia ID, Freebase ID, Publication date, and Genre of the book are dropped since they are not useful to the recommender system.
The following are then carried out on the Summary of the plot - 
1. Removal of punctuation
2. Extraction of keywords and key phrases using Rake
3. These keywords and phrases are then processed to remove commas and other such things.
4. The Author information and the modified summary are then merged into a new column carrying all the relevant information.

3. Transformation of data and further steps
- Here, TF-IDF vectorizer is used to convert the relevant information into suitable format for similarity to be measured using the Cosine Similarity.

4. Recommendation - Here, the recommendation function takes input from the user, calculates similarity of the book with other books, and recommends two books to the user.

Important files in the project - 

1. - Contains the source code for the project.
2. booksummaries.txt - Contains the dataset used in the project.

Output of the above model - 


Download Complete Code


No comments yet