By Sahib Arora
This is a Python concept which uses a concept machine learning, called Logistic Regression, this code predicts a certain customers credit rating, 0 being bad and 1 being good.
The following code predicts the credit rating of a customer. In the code first the data set is read into the jupyter notebbok afterwhich the iloc statement is used to remove the predicted coloumn, we will build and train our model and later compare whether the predicted output and the actual output are same or not, then we split the data to training(70% of data) and testing data(30% of data) .
After splitting the data set we add a coloumn of ones (required for regression) then we find the siginificant variables. This data set has 16 rows meaning 16 features of variable X (eg. age, marital status, credit history, etc) not all the 16 variables are required to make the prediction therefore we remove the non significant variables, for this we print the summary of all the variables and remove the ones with p_value greater than 0.5. After we know the significant variables we apply logistic regression to only these variables. In logistic Regression the training data is plotted on a sigmoid function output of which is a decimal between 0 and 1. The prediction is yes (binary 1) if the value is above 0.5 else its a no(binary 0).
After applying the model to the test set we plot a heatmap using seaborn library to compare our predictions with actual value.It can be seen from the map that we have approximately 80% accuracy.