Coders Packet

Word Tokenizing and comparision with and without nltk in sentences/paragraph in Python

By Pusuluri Sidhartha Aravind

A python project to perform word tokenization with and without the use of nltk library from python and removing stopwords.

INTRODUCTION:

The project provided helps perform stopword removal in multiple ways, using nltk library in sentence/paragraph and without using nltk library in sentence/ paragraph, and also the comparison of the words post stop word removal by using and not using nltk library in a sentence/ paragraph.

REQUIREMENTS:


1. System should have installed Python 2.6 or above.

2. System should have nltk installed.

3. System should have numpy installed (optional, to work with arrays in addition to the work done)

Installation guide:

a. Install python

b. Install pip

c. pip install numpy

d. pip install nltk

e. import nltk

f. nltk.download() 

 


INPUT FORMAT:


The program requires the user to have an input in the variable named text1, but can have another value input into it to replace the existing information to provide another output.

The program also allows you to remove any type of stop words they wish to remove from the sentence they are working with by simply replacing the array with the name stop_words.

The input that the program requires is in the number format with a range of 1-6.

 


OUTPUT FORMAT:

 

inp1

The output will be in the format as above, where the input is required in the form of a number.

option1

The above photo indicates the result in the case that the input provided in 1, the case where the stopword removal is done without the use of nltk in a sentence

option2

The above photo indicates the result in the case that the input provided in 2, the case where the stopword removal is done using nltk in a sentence

option5

The above photo indicates the result in the case that the input provided in 5, the case where the stopword removal is done with and without nltk in a sentence, and they are compared if the result is same or not(True=same, False=not same).

CONCLUSION:

The resultant words are the remainder after the removal of the stop words from the provided sentence/paragraph.

Download project

Reviews Report

Submitted by Pusuluri Sidhartha Aravind (aravindpusuluri)

Download packets of source code on Coders Packet