Python script to detect plagiarism in the textual document using the basic concept of vector’s dot product or cosine similarity.
Implemented cosine similarity or vector’s dot product to detect plagiarism in a textual document.
Since computers can only understand binary, therefore we need to perform some computation on textual data(Work embedding).
pip install -U scikit-learn
Code is in Python
os module for loading path of a text file
TfidfVectorizer to perform word embedding on our textual data
cosine similarity to compute the plagiarism
To run -
To run this code you need to have your document in your project directory with extension.txt.
Replace the file1.txt name from the file name to be checked for plag and file2.txt name with the file name from while file1 has to be compared.