Word cloud is a pictorial representation of important keywords in our text. The important keywords are the most frequently occurring words in our text.
A Word Cloud is a visualisation of text data and keywords which are frequently occured in given text or data. For uploaded text or data in '.txt' formatted file this code generates a Word Cloud. The text must be compulsorily in text(.txt) format. Save the data you want to generate a wordcloud for in .txt format onto your computer.
Firstly we need to install and import some required libraries viz. wordcloud, fileupload, ipywidgets and jupyter extensions for fileupload.
!pip install wordcloud
!pip install fileupload
!pip install ipywidgets
!jupyter nbextension install --py --user fileupload
!jupyter nbextension enable --py fileupload
import wordcloud
import numpy as np
from matplotlib import pyplot as plt
from IPython.display import display
import fileupload
import io
import sys
In this part of code we generate a widget named 'Browse'. By clicking this widget a dialogue box opens in which ypu can browse file for which we you want to generate Word Cloud.
Here, from already imported *fileupload* library we use *FileUploadWidget()* to generate that "*Browse*" button.
Your uploaded file will be named as "*file_contents*" for further use in our code.
**Note: The file uploaded must contain only text(alphabets) no integers.**
# This is the uploader widget
def _upload():
_upload_widget = fileupload.FileUploadWidget()
def _cb(change):
global file_contents
decoded = io.StringIO(change['owner'].data.decode('utf-8'))
filename = change['owner'].filename
print('Uploaded `{}` ({:.2f} kB)'.format(
filename, len(decoded.read()) / 2 **10))
file_contents = decoded.getvalue()
_upload_widget.observe(_cb, names='data')
display(_upload_widget)
_upload()
Here we calculate the frequencies of words in your uploaded file named "file_contents" by following steps:
1)Firstly, removing punctuation marks if any.
2)And then removing "uninteresting_words" as given in code.
3) Then removing integers if any
After all of that make a dictionary with keys as words in your text after removing punctuation marks and uninteresting_marks and values as frequencies of words.
Then using WordCloud library generate a cloud of words.
def calculate_frequencies(file_contents):
# Here is a list of punctuations and uninteresting words you can use to process your text
punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
"we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
"their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
"have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
"all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
# LEARNER CODE START HERE
my_string = file_contents
mod_str = "" #modified string
for char in my_string:
if char not in punctuations:
mod_str+=char
mod_str1 = mod_str.split()
resultwords = [word for word in mod_str1 if word.lower() not in uninteresting_words]
result = ' '.join(resultwords)
result= result.split(" ")
# print(res1)
no_int = [x for x in result if not (x.isdigit()
or x[0] == '-' and x[1:].isdigit())]
result=no_int
res=dict((x,result.count(x)) for x in set(result))
freq = {}#frequencies of words
for k,v in res.items():
freq[k] = int(v)
#wordcloud
cloud = wordcloud.WordCloud()
cloud.generate_from_frequencies(freq)
return cloud.to_array()
At last, this code gives us the word cloud for our given text or data using plt.imshow(), plt.show() in "matplotlib" library.
# Display your wordcloud image
myimage = calculate_frequencies(file_contents)
plt.imshow(myimage, interpolation = 'nearest')
plt.axis('off')
plt.show()
**All the bold part of the text is code for our Word Cloud project
Submitted by Sridhar Vempati (vempatisridharbzy)
Download packets of source code on Coders Packet
Comments