By Ishan Gurtu
This project performs Image Captioning using both NLP and CV techniques in Python having a fair accuracy. Flicker 8K Dataset was used and trained using Inception V3, my model and Glove vectors.
This code packet contains a python executable file and a saved model.h5 file that was trained and worked best for this dataset. This project is executed in Python Programming language.
Image captioning is converting or translating the contents of an image in a written sentence format in a specific language (English in this case).
The Flicker 8K Dataset used here consists of 8000 images of different sizes with 5 captions for each image present in text data. 6000 images are used for training, 1000 for testing, and the rest to randomly select the images to check the accuracy and practicality of our model.
Inception V3 architecture by Google is used to extract Image features and reduce training time. Glove vectors which are a set of NLP common word vectors.
The broad sequence of how this task was achieved using Google Colaboratory is :
Link for Flicker 8k image data, text data, and glove:
Images and Text
https://www.kaggle.com/shadabhussain/flickr8k/download (Use Only Images and Text Data, not the trained model)
Glove 200d.txt data(Glove Vectors)