Ice-cream flavor predictor using scikit learn library in python
By Nitya Harshitha T
This project utilizes the scikit learn inbuilt library in python to predict the flavor of ice cream suitable for a person when their age and gender are given as input.
Kindly install jupyter notebook or if you are already familiar with using VS code add the jupyter plugin, as this is where we will be executing the code as interlinked snippets. Also install Pandas, Numpy, Scikit-Learn, and Matplotlib libraries into your python executer.
- This project is a basic implementation of decision trees which helps the program to predict the flavor that a particular person when their ‘age’ and ‘gender’ are specified as dynamic inputs.
- This is a machine learning program that makes use of an inbuilt library in python called ‘sklearn’ or ‘scikit learn’. This library consists of almost all kinds of popular ML algorithms like linear regression, logistic regression, Naïve Bayes classifier, etc.
We will be executing the code individually as separate snippets which will work together to form the decision tree model for our prediction purpose.
- Import necessary libraries for loading the CSV file and print the DataFrame to check if it's correctly loaded. We have used a Coffee.csv data file to get the information for performing this program.
- Get the details about the CSV file like how many rows and columns it has, data types of values each column consists of, presence of null values, etc. To do this we use .info() function.
- Here we can see that Gender and Flavour columns have object data types and are hence considered as categorical variables. Since Flavour is the outcome variable we don’t have to consider it and we have to just change the gender attribute into numerical from categorical. The reason for doing this is that ML models can train themselves correctly in presence of numerical variables which is not the case in categorical variables.
- After changing values of gender attribute to numerical data type, separate the input attributes – “age” and “gender” as x and output variable column “flavor” as y. This helps us to perform the train and test split to train the decision tree model.
- Import the inbuilt functions sklearn to get the decision tree model. The decision tree can be implemented using different algorithms like ID3, CART, C4.5, etc. Here I have used the ID3 model and hence the criterion is taken as entropy. The entropy is nothing but the measure of impurity in the data provided.
Now we can fit the decision tree model to the training data and implement the model developed on training data and apply that on test data to make the predictions. Later we can compare predicted flavors with original flavors in the file to measure the accuracy. We can also make a confusion matrix to find the number of true positives and true negatives identified.
A function is developed to give dynamic input age and input gender and the system so that it can automatically predict the flavor based on the model developed above.
We can also plot/visualize this decision tree model using Matplotlib and sklearn.plot_tree libraries which are available in python.
Project file consists of :
1) Flavour_predictor.ipynb notebook
2) Coffee.csv input data file
Output for the implemented prediction model