This project aims to build a basic chatbot in Python using NLTK. It's a simple bot but a great way for one to understand NLP.
About NLP
Using NLP (Natural Language Processing), computers can analyze, understand and derive meaning from human language in an effective way. NLP acts as a great way for developers to progress into tasks pertaining to speech recognition, sentiment analysis, automatic summarization, translation as well as named entity recognition.
#import libraries import io import random import string import warnings import numpy as np from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity import warnings warnings.filterwarnings('ignore')
In the above code snippet, we imported libraries from Python such as NumPy and sklearn for supporting classification algorithms in our program.
Now, let's install NLTK. NLTK or Natural Language Toolkit is a platform used in Python programming to work with human language.
pip install nltk
#importing nltk packages import nltk from nltk.stem import WordNetLemmatizer nltk.download('popular', quiet=True)
from google.colab import drive drive.mount('/content/drive')
f=open('/content/drive/My Drive/chatbot.txt','r',errors = 'ignore') raw=f.read() raw = raw.lower()
In the above code snippets, we placed our corpus text into a text file and mounted it. Here, we're calling it chatbot.txt The f.read() would help us read the file. The raw. lower() would help us convert the text into lowercase. This is ideal as for any NLP project, we need to pre-process it to make it effective for working. Text-processing basically involves,
The NLTK package contains a pre-trained Punkt tokenizer for English.
#Tokenization sent_tokens = nltk.sent_tokenize(raw) word_tokens = nltk.word_tokenize(raw)
In the above code snippets,
#Pre-processing lem = nltk.stem.WordNetLemmatizer() def LemTokens(tokens): return [lem.lemmatize(token) for token in tokens] remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation) def LemNormalize(text): return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))
In the above code snippet, we define a function called LemTokens where we input tokens and return normalized tokens.
#Keyword Matching GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey",) GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"] def greeting(sentence): for word in sentence.split(): if word.lower() in GREETING_INPUTS: return random.choice(GREETING_RESPONSES)
We can see above that for every input greeting, we have a standardized response. This forms the base for initiating conversation with our chatbot.
# Cosine Similarity def response(user_response): robo_response='' sent_tokens.append(user_response) TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english') tfidf = TfidfVec.fit_transform(sent_tokens) vals = cosine_similarity(tfidf[-1], tfidf) idx=vals.argsort()[0][-2] flat = vals.flatten() flat.sort() req_tfidf = flat[-2] if(req_tfidf==0): robo_response=robo_response+"I'm sorry! I don't understand you" return robo_response else: robo_response = robo_response+sent_tokens[idx] return robo_response
Now, let's write code for our conversation initiation and ending. This is the fun part.
lag=True print("BOBtheRobo: My name is Bob. I'm here if you wanna chat. If you don't, type Bye!") while(flag==True): user_response = input() user_response=user_response.lower() if(user_response!='bye'): if(user_response=='thanks' or user_response=='thank you' ): flag=False print("BOBtheRobo: You are welcome!") else: if(greeting(user_response)!=None): print("BOBtheRobo: "+greeting(user_response)) else: print("BOBtheRobo: ",end="") print(response(user_response)) sent_tokens.remove(user_response) else: flag=False print("ROBO: Bye! See ya later")
This way, we can build a basic chatbot and understand how NLP works
Thank you.
Submitted by Anoushka Mergoju (Anoushka)
Download packets of source code on Coders Packet
Comments