Coders Packet

Identifying potential customers for loans using Machine Learning technique based on Python

By Yashraj Tambe

Here we identify potential customers for loans using Machine Learning techniques based on Python

The project aims for identifying customers who might or might not be eligible for loan.Since this is a task of categorizing customers this becomes a classification task of machine learning.The data is past data which actually has a one-to-one mapping of features & the customer eligibility.Hence this becomes a Supervised Machine Learning task.

Dataset: -

The Dataset can be downloaded from here .

Dependencies quest: -

Dependencies for this project will be limited to Python & libraries like Pandas,Scikit learn,Seaborn,Matplotlib.Python can be installed diretly from the official python website but the libraries then need to be manually installed using the following commands: -

pip install pandas
pip install scikit-learn
pip install matplotlib
pip install seaborn

Otherwise,if the installation is done with Anaconda everything is being already taken care of.

Initializing the model creation process: -

1)Importing libraries

import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score,classification_report
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier

2)Importing the dataset & reading the dataset

personaloan_df=pd.read_csv('Bank_Personal_Loan_Modelling-1.csv')
personaloan_df.head()

3)Checking the shape of dataset

personaloan_df.shape

4)Checking for presence of missing values

personaloan_df.isna().sum()

5)Checking the statistical information of each columns

personaloan_df.describe()

6)To check the relation among different variable a pairplot & heatmap of correlation helps to find relevant features

A pairplot is plotted using the following code piece

sns.pairplot(personaloan_df,kind='reg',diag_kind='kde')

A heatmap is plotted using the following code

from matplotlib import pyplot as plt
plt.figure(figsize=(25, 25))
ax = sns.heatmap(personaloan_df.corr(), annot=True)
plt.title('Correlation')
plt.show()

7)As per the plots above few features were redundant,hence we need to drop them using

personaloan_df=personaloan_df.drop(labels=['ID','Age','Experience'],axis=1)

8)Splitting the data into feature variable & target variable

X=personaloan_df.drop(labels=['Personal Loan'],axis=1)
y=personaloan_df.drop(labels=['Income','ZIP Code','Family','CCAvg','Education','Mortgage','Securities Account','CD Account','Online','CreditCard'],axis=1)

9)Splitting the data into train & test subset with a ratio of 70:30

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.30)

Everything is in place & then we builld model by creating an object of our model class

10)Instantiating the model

Logreg=LogisticRegression()
navbay=GaussianNB()
NNH=KNeighborsClassifier(n_neighbors=3,weights='distance')

11)Finally let's train the model on our data

Logreg.fit(X_train,y_train)
navbay.fit(X_train,y_train)
NNH.fit(X_train,y_train)

12)Now we can predict with the model on our validation dataset to know how accurately our model performs by comparing the predicted labels to the actual labels

y_pred_log=Logreg.predict(X_test)
y_pred_nav=navbay.predict(X_test)
y_pred_NNH=NNH.predict(X_test)

13)Now we use the metric class's accuracy score & classification report modules to verify our model

scorecard_log=metrics.accuracy_score(y_test,y_pred)#For logistic model
scorecard_nav=metrics.accuracy_score(y_test,y_pred)#For naive bayes model
scorecard_nnh=metrics.accuracy_score(y_test,y_pred)#For K-NN model

14)A classification report helps in understanding a detailed report of which class the model is classifying with what acuracy,precision & F1 score

f1_score_log=classification_report(y_test,y_pred)#For logistic regression model
f1_score_nav=classification_report(y_test,y_pred)#For naive bayes model
f1_score_knn=classification_report(y_test,y_pred)#For knn model

Out of all the above models the Logistic Regression outperforms them with an accuracy of 90%.

Note:The script was coded & executed on Jupyter Notebook.

Download Complete Code

Comments

No comments yet

Download Packet

Reviews Report

Submitted by Yashraj Tambe (yashraj)

Download packets of source code on Coders Packet