Coders Packet

Basic ChatBot using NLTK in Python

By Anoushka Mergoju

This project aims to build a basic chatbot in Python using NLTK. It's a simple bot but a great way for one to understand NLP.

About NLP

Using NLP (Natural Language Processing), computers can analyze, understand and derive meaning from human language in an effective way. NLP acts as a great way for developers to progress into tasks pertaining to speech recognition, sentiment analysis, automatic summarization, translation as well as named entity recognition.

#import libraries
import io
import random
import string
import warnings
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import warnings

In the above code snippet, we imported libraries from Python such as NumPy and sklearn for supporting classification algorithms in our program.

Now, let's install NLTK. NLTK or Natural Language Toolkit is a platform used in Python programming to work with human language.

pip install nltk
#importing nltk packages
import nltk
from nltk.stem import WordNetLemmatizer'popular', quiet=True)
In the above code snippet, we imported nltk packages and obtained a boolean value True.
Now, let's read in the corpus. By this, we mean to use a page (here: Wikipedia) for implementing text-processing. 
from google.colab import drive
f=open('/content/drive/My Drive/chatbot.txt','r',errors = 'ignore')
raw = raw.lower()

In the above code snippets, we placed our corpus text into a text file and mounted it. Here, we're calling it chatbot.txt The would help us read the file. The raw. lower() would help us convert the text into lowercase. This is ideal as for any NLP project, we need to pre-process it to make it effective for working. Text-processing basically involves,

  • Converting the whole text to either upper-case or lower-case
  • Tokenization

The NLTK package contains a pre-trained Punkt tokenizer for English.

sent_tokens = nltk.sent_tokenize(raw)
word_tokens = nltk.word_tokenize(raw)

In the above code snippets,

  • sent_tokens converts text into a list of sentences
  • word_tokens converts text into a list of words
lem = nltk.stem.WordNetLemmatizer()
def LemTokens(tokens):
    return [lem.lemmatize(token) for token in tokens]
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)

def LemNormalize(text):
    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))

In the above code snippet, we define a function called LemTokens where we input tokens and return normalized tokens.

#Keyword Matching
GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey",)
GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"]
def greeting(sentence):
    for word in sentence.split():
        if word.lower() in GREETING_INPUTS:
            return random.choice(GREETING_RESPONSES)

We can see above that for every input greeting, we have a standardized response. This forms the base for initiating conversation with our chatbot.

# Cosine Similarity
def response(user_response):
    TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
    tfidf = TfidfVec.fit_transform(sent_tokens)
    vals = cosine_similarity(tfidf[-1], tfidf)
    flat = vals.flatten()
    req_tfidf = flat[-2]
        robo_response=robo_response+"I'm sorry! I don't understand you"
        return robo_response
        robo_response = robo_response+sent_tokens[idx]
        return robo_response

Now, let's write code for our conversation initiation and ending. This is the fun part.

print("BOBtheRobo: My name is Bob. I'm here if you wanna chat. If you don't, type Bye!")
    user_response = input()
        if(user_response=='thanks' or user_response=='thank you' ):
            print("BOBtheRobo: You are welcome!")
                print("BOBtheRobo: "+greeting(user_response))
                print("BOBtheRobo: ",end="")
        print("ROBO: Bye! See ya later")

This way, we can build a basic chatbot and understand how NLP works

Thank you.

Download Complete Code


No comments yet