Basic ChatBot using NLTK in Python

chatbot.py

This project aims to build a basic chatbot in Python using NLTK. It's a simple bot but a great way for one to understand NLP.

About NLP

Using NLP (Natural Language Processing), computers can analyze, understand and derive meaning from human language in an effective way. NLP acts as a great way for developers to progress into tasks pertaining to speech recognition, sentiment analysis, automatic summarization, translation as well as named entity recognition.

#import libraries
import io
import random
import string
import warnings
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import warnings
warnings.filterwarnings('ignore')

In the above code snippet, we imported libraries from Python such as NumPy and sklearn for supporting classification algorithms in our program.

Now, let's install NLTK. NLTK or Natural Language Toolkit is a platform used in Python programming to work with human language.

pip install nltk

#importing nltk packages
import nltk
from nltk.stem import WordNetLemmatizer
nltk.download('popular', quiet=True)

In the above code snippet, we imported nltk packages and obtained a boolean value True.

Now, let's read in the corpus. By this, we mean to use a page (here: Wikipedia) for implementing text-processing.

from google.colab import drive
drive.mount('/content/drive')

f=open('/content/drive/My Drive/chatbot.txt','r',errors = 'ignore')
raw=f.read()
raw = raw.lower()

In the above code snippets, we placed our corpus text into a text file and mounted it. Here, we're calling it chatbot.txt The f.read() would help us read the file. The raw. lower() would help us convert the text into lowercase. This is ideal as for any NLP project, we need to pre-process it to make it effective for working. Text-processing basically involves,

Converting the whole text to either upper-case or lower-case
Tokenization

The NLTK package contains a pre-trained Punkt tokenizer for English.

#Tokenization
sent_tokens = nltk.sent_tokenize(raw)
word_tokens = nltk.word_tokenize(raw)

In the above code snippets,

sent_tokens converts text into a list of sentences
word_tokens converts text into a list of words

#Pre-processing
lem = nltk.stem.WordNetLemmatizer()
def LemTokens(tokens):
    return [lem.lemmatize(token) for token in tokens]
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)

def LemNormalize(text):
    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))

In the above code snippet, we define a function called LemTokens where we input tokens and return normalized tokens.

#Keyword Matching
GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey",)
GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"]
def greeting(sentence):
 
    for word in sentence.split():
        if word.lower() in GREETING_INPUTS:
            return random.choice(GREETING_RESPONSES)

We can see above that for every input greeting, we have a standardized response. This forms the base for initiating conversation with our chatbot.

# Cosine Similarity
def response(user_response):
    robo_response=''
    sent_tokens.append(user_response)
    TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
    tfidf = TfidfVec.fit_transform(sent_tokens)
    vals = cosine_similarity(tfidf[-1], tfidf)
    idx=vals.argsort()[0][-2]
    flat = vals.flatten()
    flat.sort()
    req_tfidf = flat[-2]
    if(req_tfidf==0):
        robo_response=robo_response+"I'm sorry! I don't understand you"
        return robo_response
    else:
        robo_response = robo_response+sent_tokens[idx]
        return robo_response

Now, let's write code for our conversation initiation and ending. This is the fun part.

lag=True
print("BOBtheRobo: My name is Bob. I'm here if you wanna chat. If you don't, type Bye!")
while(flag==True):
    user_response = input()
    user_response=user_response.lower()
    if(user_response!='bye'):
        if(user_response=='thanks' or user_response=='thank you' ):
            flag=False
            print("BOBtheRobo: You are welcome!")
        else:
            if(greeting(user_response)!=None):
                print("BOBtheRobo: "+greeting(user_response))
            else:
                print("BOBtheRobo: ",end="")
                print(response(user_response))
                sent_tokens.remove(user_response)
    else:
        flag=False
        print("ROBO: Bye! See ya later")

Coders Packet

Basic ChatBot using NLTK in Python

Comments