Plagiarism checker using Python

By Swapnil

Python script to detect plagiarism in the textual document using the basic concept of vector’s dot product or cosine similarity.


Implemented cosine similarity or vector’s dot product to detect plagiarism in a textual document.

Since computers can only understand binary, therefore we need to perform some computation on textual data(Work embedding).



pip install -U scikit-learn



Code is in Python


Library used-

os module for loading path of a text file

TfidfVectorizer to perform word embedding on our textual data

cosine similarity to compute the plagiarism


To run -

$python3 Plag_checker.ipynb

To run this code you need to have your document in your project directory with extension.txt.

Replace the file1.txt name from the file name to be checked for plag and file2.txt name with the file name from while file1 has to be compared.

