Here we identify potential customers for loans using Machine Learning techniques based on Python
The project aims for identifying customers who might or might not be eligible for loan.Since this is a task of categorizing customers this becomes a classification task of machine learning.The data is past data which actually has a one-to-one mapping of features & the customer eligibility.Hence this becomes a Supervised Machine Learning task.
Dataset: -
The Dataset can be downloaded from here .
Dependencies quest: -
Dependencies for this project will be limited to Python & libraries like Pandas,Scikit learn,Seaborn,Matplotlib.Python can be installed diretly from the official python website but the libraries then need to be manually installed using the following commands: -
pip install pandas pip install scikit-learn pip install matplotlib pip install seaborn
Otherwise,if the installation is done with Anaconda everything is being already taken care of.
Initializing the model creation process: -
1)Importing libraries
import pandas as pd import seaborn as sns from matplotlib import pyplot as plt from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score,classification_report from sklearn.naive_bayes import GaussianNB from sklearn.neighbors import KNeighborsClassifier
2)Importing the dataset & reading the dataset
personaloan_df=pd.read_csv('Bank_Personal_Loan_Modelling-1.csv') personaloan_df.head()
3)Checking the shape of dataset
personaloan_df.shape
4)Checking for presence of missing values
personaloan_df.isna().sum()
5)Checking the statistical information of each columns
personaloan_df.describe()
6)To check the relation among different variable a pairplot & heatmap of correlation helps to find relevant features
A pairplot is plotted using the following code piece
sns.pairplot(personaloan_df,kind='reg',diag_kind='kde')
A heatmap is plotted using the following code
from matplotlib import pyplot as plt plt.figure(figsize=(25, 25)) ax = sns.heatmap(personaloan_df.corr(), annot=True) plt.title('Correlation') plt.show()
7)As per the plots above few features were redundant,hence we need to drop them using
personaloan_df=personaloan_df.drop(labels=['ID','Age','Experience'],axis=1)
8)Splitting the data into feature variable & target variable
X=personaloan_df.drop(labels=['Personal Loan'],axis=1) y=personaloan_df.drop(labels=['Income','ZIP Code','Family','CCAvg','Education','Mortgage','Securities Account','CD Account','Online','CreditCard'],axis=1)
9)Splitting the data into train & test subset with a ratio of 70:30
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.30)
Everything is in place & then we builld model by creating an object of our model class
10)Instantiating the model
Logreg=LogisticRegression() navbay=GaussianNB() NNH=KNeighborsClassifier(n_neighbors=3,weights='distance')
11)Finally let's train the model on our data
Logreg.fit(X_train,y_train) navbay.fit(X_train,y_train) NNH.fit(X_train,y_train)
12)Now we can predict with the model on our validation dataset to know how accurately our model performs by comparing the predicted labels to the actual labels
y_pred_log=Logreg.predict(X_test) y_pred_nav=navbay.predict(X_test) y_pred_NNH=NNH.predict(X_test)
13)Now we use the metric class's accuracy score & classification report modules to verify our model
scorecard_log=metrics.accuracy_score(y_test,y_pred)#For logistic model scorecard_nav=metrics.accuracy_score(y_test,y_pred)#For naive bayes model scorecard_nnh=metrics.accuracy_score(y_test,y_pred)#For K-NN model
14)A classification report helps in understanding a detailed report of which class the model is classifying with what acuracy,precision & F1 score
f1_score_log=classification_report(y_test,y_pred)#For logistic regression model f1_score_nav=classification_report(y_test,y_pred)#For naive bayes model f1_score_knn=classification_report(y_test,y_pred)#For knn model
Out of all the above models the Logistic Regression outperforms them with an accuracy of 90%.
Note:The script was coded & executed on Jupyter Notebook.
Submitted by Yashraj Tambe (yashraj)
Download packets of source code on Coders Packet
Comments