Coders Packet

Tweets Checking in Python using Natural Language processing (NLP)

By Tanveer Chawla

Checking tweets with Python natural language processing can be used in censoring inappropriate words.

Natural language processing (NLP) is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.

Natural Language Processing, or NLP for short, is broadly defined as the automatic manipulation of natural language, like speech and text, by software. Natural language refers to the way we, humans, communicate with each other. Namely, speech and text. We are surrounded by text. Think about how much text you see each day:

  • Signs
  • Menus
  • Email
  • SMS
  • Web Pages
  • and so much more…

The list is endless. Now think about speech. We may speak to each other, as a species, more than we write. It may even be easier to learn to speak than to write. Voice and text are how we communicate with each other. Given the importance of this type of data, we must have methods to understand and reason about natural language, just like we do for other types of data.

Classical linguistics involved devising and evaluating rules of language. Great progress was made on formal methods for syntax and semantics, but for the most part, the interesting problems in natural language understanding resist clean mathematical formalisms.

Broadly, a linguist is anyone who studies language, but perhaps more colloquially, a self-defining linguist may be more focused on being out in the field.

Mathematics is the tool of science. Mathematicians working on natural language may refer to their study as mathematical linguistics, focusing exclusively on the use of discrete mathematical formalisms and theory for natural language (e.g. formal languages and automata theory).

As machine learning practitioners interested in working with text data, we are concerned with the tools and methods from the field of Natural Language Processing. We have seen the path from linguistics to NLP in the previous section.

Now, let’s take a look at how modern researchers and practitioners define what NLP is all about. In perhaps one of the more widely textbooks written by top researchers in the field, they refer to the subject as “linguistic science,” permitting discussion of both classical linguistics and modern statistical methods.

The following is a list of some of the most commonly researched tasks in natural language processing. Some of these tasks have direct real-world applications, while others more commonly serve as subtasks that are used to aid in solving larger tasks. 

In the following project, the bad words from a tweet are analyzed and can be used as a censoring algorithm. A pool of words has been provided through predefined libraries and can be extended manually.

Further, the analysis for the good or bad words has been visualized through graphs and plots. Also, there is scope for updating through upgrading the provided dataset.

Download Complete Code

Comments

No comments yet