Coders Packet

Building an Emojifier in Python

By Hritwick Manna

In this tutorial, we will build a baseline model (Emojifier-V1). We are going to use word vector representations to build an Emojifier.

INTRODUCTION

1) In this tutorial, we are going to build a basic emojifier for short sentences/words in python using word vectors/ word embeddings.

2) We here use sentences like "congratulations on getting a job! Let's get tea and chat. "

    The emojifier will turn this automatically to "congratulations on getting a job! Let's get tea and chat. ☕️ "

3) We here wanted to make our sentences more expressive with logos/symbols and our emojifier application will enable us to do that thing.

4) We will here design a model that will take a sentence (eg. "Let's go to see cricket game today!") and find the most meaningful/best emoji to be used for this sentence (⚾️). 

Using Word Vectors/ Word Embeddings to make emoji's style better

1) In most of the emoji logos, we should need to make sure that ❤️ is the "heart" logo other than the "love" logo or any other thing.
2) In simple words writing "love" will not display ❤️ rather ❤️ is used for the desired emoji.
3) We will use word vectors/ word embeddings to make our emoji's interface more accurate.
4) Even if our size of the training set is low and only maps a few short sentences to a fixed emoji logo, our application will be able to identify the correct emoji for our test set.
5) This will work even better if those short sentences/ words which are not even stored in our training set.
6) Hence by using a small training set this will allow us to make more accurate classification mapping.

What do mean by Word Vector/ Word Embedding?

1) Here word vector and word embedding are almost the same words.

2) Word embedding is used to represent a particular word in vector-matrix form from low to high dimensional one hot vector depending on the features/semantics associated with it.

3) We here use cosine similarity that is in lemans language we will use distance approach to calculate the similarity between words to find an unknown similar word associated with them. eg., actor- man + woman = actress.

4) In simple words, we have subtracted one meaning from the word vector for an actor (i.e male), add another definition (female), and shows us the newly formed word vector (actor-man+woman) matches nearly to the word vector of the queen.

Highlights

1) We are able to get a fair model even with the low training dataset.
2) This occurs all because of the generalisation power the word vector gives us.
3) The emojifier will not able to perform well on sentences such as "This food is not good and not tasty"
4) Since it doesn't understand the combinations of words.
5) It just averages all the words' embedding vectors together, without considering the ordering of words.

So this is all about my tutorial, I hope it will help most of you.

Download Complete Code

Comments

No comments yet

Download Packet

Reviews Report

Submitted by Hritwick Manna (hritwickmanna)

Download packets of source code on Coders Packet