This project aims to build a basic chatbot in Python using NLTK. It's a simple bot but a great way for one to understand NLP.
About NLP
Using NLP (Natural Language Processing), computers can analyze, understand and derive meaning from human language in an effective way. NLP acts as a great way for developers to progress into tasks pertaining to speech recognition, sentiment analysis, automatic summarization, translation as well as named entity recognition.
#import libraries
import io
import random
import string
import warnings
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import warnings
warnings.filterwarnings('ignore')
In the above code snippet, we imported libraries from Python such as NumPy and sklearn for supporting classification algorithms in our program.
Now, let's install NLTK. NLTK or Natural Language Toolkit is a platform used in Python programming to work with human language.
pip install nltk
#importing nltk packages
import nltk
from nltk.stem import WordNetLemmatizer
nltk.download('popular', quiet=True)
from google.colab import drive
drive.mount('/content/drive')
f=open('/content/drive/My Drive/chatbot.txt','r',errors = 'ignore')
raw=f.read()
raw = raw.lower()
In the above code snippets, we placed our corpus text into a text file and mounted it. Here, we're calling it chatbot.txt The f.read() would help us read the file. The raw. lower() would help us convert the text into lowercase. This is ideal as for any NLP project, we need to pre-process it to make it effective for working. Text-processing basically involves,
The NLTK package contains a pre-trained Punkt tokenizer for English.
#Tokenization sent_tokens = nltk.sent_tokenize(raw) word_tokens = nltk.word_tokenize(raw)
In the above code snippets,
#Pre-processing
lem = nltk.stem.WordNetLemmatizer()
def LemTokens(tokens):
return [lem.lemmatize(token) for token in tokens]
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)
def LemNormalize(text):
return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))
In the above code snippet, we define a function called LemTokens where we input tokens and return normalized tokens.
#Keyword Matching
GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey",)
GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"]
def greeting(sentence):
for word in sentence.split():
if word.lower() in GREETING_INPUTS:
return random.choice(GREETING_RESPONSES)
We can see above that for every input greeting, we have a standardized response. This forms the base for initiating conversation with our chatbot.
# Cosine Similarity
def response(user_response):
robo_response=''
sent_tokens.append(user_response)
TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
tfidf = TfidfVec.fit_transform(sent_tokens)
vals = cosine_similarity(tfidf[-1], tfidf)
idx=vals.argsort()[0][-2]
flat = vals.flatten()
flat.sort()
req_tfidf = flat[-2]
if(req_tfidf==0):
robo_response=robo_response+"I'm sorry! I don't understand you"
return robo_response
else:
robo_response = robo_response+sent_tokens[idx]
return robo_response
Now, let's write code for our conversation initiation and ending. This is the fun part.
lag=True
print("BOBtheRobo: My name is Bob. I'm here if you wanna chat. If you don't, type Bye!")
while(flag==True):
user_response = input()
user_response=user_response.lower()
if(user_response!='bye'):
if(user_response=='thanks' or user_response=='thank you' ):
flag=False
print("BOBtheRobo: You are welcome!")
else:
if(greeting(user_response)!=None):
print("BOBtheRobo: "+greeting(user_response))
else:
print("BOBtheRobo: ",end="")
print(response(user_response))
sent_tokens.remove(user_response)
else:
flag=False
print("ROBO: Bye! See ya later")
This way, we can build a basic chatbot and understand how NLP works
Thank you.
Submitted by Anoushka Mergoju (Anoushka)
Download packets of source code on Coders Packet
Comments