Coders Packet

Extractive Text Summarization using Python

By Hitesh Velaga

In this packet I have created a text summarization pipeline using Python that takes in data that is text(paragraphs) and convert them to the size that user wishes.

For this I have used medical data set which consist of information about various gels and tablets. Each data point is a paragraph of many sentences. Here I have first converted the shortforms to the full words like aren't is are not, then I have removed all the numeric and special characters from the data. Then I have split the text inside the paragraphs into a list of sentences. Then I have calculated the word frequency for the words in the sentences that are not the stop words. For doing this I used sklearn's nltk library. Then I calculated the score for each sentence. This score is nothing but the sum of the frequencies of the words that are there in the sentence. This sentence score is used for determining which sentence should be give high priority. Then I created a function summarize which takes a value n that is used to select the number of lines for the output.

Text summarization can also be done using various machine learning algorithms but for that we need to train our model for which we need the summarized text. If you do not have that then you can use this.

Requirements:

Python, Jupiter notebook, Sklearn, NLTK, Heapq, re, numpy, pandas.

Download Complete Code

Comments

No comments yet

Download Packet

Reviews Report

Submitted by Hitesh Velaga (saihiteshvellaga)

Download packets of source code on Coders Packet