Coders Packet

Word Cloud Using Python

By Sridhar Vempati

Word cloud is a pictorial representation of important keywords in our text. The important keywords are the most frequently occurring words in our text.

A Word Cloud is a visualisation of text data and keywords which are frequently occured in given text or data. For uploaded text or data in '.txt' formatted file this code generates a Word Cloud. The text must be compulsorily in text(.txt) format. Save the data you want to generate a wordcloud for in .txt format onto your computer.

Firstly we need to install and import some required libraries viz. wordcloud, fileupload, ipywidgets and jupyter extensions for fileupload.

 


!pip install wordcloud
!pip install fileupload
!pip install ipywidgets
!jupyter nbextension install --py --user fileupload
!jupyter nbextension enable --py fileupload


import wordcloud
import numpy as np
from matplotlib import pyplot as plt
from IPython.display import display
import fileupload
import io
import sys


In this part of code we generate a widget named 'Browse'. By clicking this widget a dialogue box opens in which ypu can browse file for which we you want to generate Word Cloud.


Here, from already imported *fileupload* library we use *FileUploadWidget()* to generate that "*Browse*" button.


Your uploaded file will be named as "*file_contents*" for further use in our code.


**Note: The file uploaded must contain only text(alphabets) no integers.**


# This is the uploader widget


def _upload():


    _upload_widget = fileupload.FileUploadWidget()


    def _cb(change):
        global file_contents
        decoded = io.StringIO(change['owner'].data.decode('utf-8'))
        filename = change['owner'].filename
        print('Uploaded `{}` ({:.2f} kB)'.format(
            filename, len(decoded.read()) / 2 **10))
        file_contents = decoded.getvalue()


    _upload_widget.observe(_cb, names='data')
    display(_upload_widget)


_upload()

 

Here we calculate the frequencies of words in your uploaded file named "file_contents" by following steps:

1)Firstly, removing punctuation marks if any.

2)And then removing "uninteresting_words" as given in code.

3) Then removing integers if any

After all of that make a dictionary with keys as words in your text after removing punctuation marks and uninteresting_marks and values as frequencies of words.

Then using WordCloud library generate a cloud of words.

def calculate_frequencies(file_contents):
    # Here is a list of punctuations and uninteresting words you can use to process your text
    punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
    "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
    "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
    "have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
    "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]
    
    # LEARNER CODE START HERE
    my_string = file_contents
    mod_str = "" #modified string
    for char in my_string:
        if char not in punctuations:
            mod_str+=char


    mod_str1 = mod_str.split()
    resultwords  = [word for word in mod_str1 if word.lower() not in uninteresting_words]
    result = ' '.join(resultwords)
    result= result.split(" ")
    # print(res1)
    no_int = [x for x in result if not (x.isdigit() 
                                        or x[0] == '-' and x[1:].isdigit())]
    
    result=no_int  
    res=dict((x,result.count(x)) for x in set(result))
    freq = {}#frequencies of words
    for k,v in res.items():
        freq[k] = int(v)
        
    #wordcloud
    cloud = wordcloud.WordCloud()   
    cloud.generate_from_frequencies(freq)
    return cloud.to_array()


At last, this code gives us the word cloud for our given text or data using plt.imshow(), plt.show() in "matplotlib" library.


# Display your wordcloud image


myimage = calculate_frequencies(file_contents)
plt.imshow(myimage, interpolation = 'nearest')
plt.axis('off')
plt.show()


**All the bold part of the text is code for our Word Cloud project

 

 

Download Complete Code

Comments

No comments yet

Download Packet

Reviews Report

Submitted by Sridhar Vempati (vempatisridharbzy)

Download packets of source code on Coders Packet