Implementing Natural Langauge Processing in Python using the Natural Language ToolKit library, Naive Bayes classifier from Scikit-learn, and the concept of TF-IDF for normalization.
The project is being coded in Python3. In this project, we will be performing natural language processing using NLTK(Natural Language ToolKit), which is a library for performing symbolic and statistical NLP in the English language written in Python.
For weighting and normalization, the TF-IDF method will be used, this will be achieved by using scikit-learn's TfidfTransformer.
So what basically TF-IDF method is that we will be computing tf-idf weight, which is a weight which is often being used in information retrieval. This weight is used to evaluate the importance of words in the document to a document in a collection or corpus.
pip install nltk pip install pandas pip install matplotlib pip install seaborn pip install scikit-learn
we will also need to install something called stopwords under corpus from nltk, for that do the following:
within Python CLI run
import nltk nltk.download()
after this, a pop-up appears
First, click on Corpora and the select stopwords, and after that click on Download. As I have already installed it on my system, its shows installed.
You can even carry out the above installation through ipynb also, it has been specified there also.
Awesome! Now you are ready to get started.
Submitted by Adarsh Hiremath (adarshhiremath)
Download packets of source code on Coders Packet
Comments