Hey fellas!
Let us dive into today’s topic of Leave-One-Out Cross Validation used in Machine Learning to assess a model’s performance and provide reliable information to further improve the performance of the model.
Leave-One-Out Cross Validation or LOOCV is a type of k-fold cross-validation method where k is equal to the number of data points. We train our machine learning model n times (n is the size of the dataset). With every iteration change in n, one sample is used as a test set while the rest are used to train the model.
LeaveOneOut is imported from sklearn.model_selection.
Let me make it simpler by using an example,
Let the size of a dataset be 6, i.e., n=6
By using LOOCV,
Iteration n=1:
x1 is the test set and x2,x3,x4,x5 makes up the training set
model score=score 1
Iteration n=2:
x2 is the test set , x1 and x3,x4,x5 are the training sets
model score=score 2
this process continues till n times.
The final performance estimate is the average of the six individual scores:
Overall Score = score1+score2+score3+score4+score5+score6/6
We shall move to the next section to create a program and run it to illustrate and grasp the technical information about LOOCV discussed before.
PROGRAM WITH EXAMPLE DATASET:
We shall use a basic dataset such as “Heigh and Weight” to perform LOOCV.
Index | Height (in cm) | Weight (in kg) | Gender (0: Female, 1: Male) |
---|---|---|---|
1 | 150 | 50 | 0 |
2 | 160 | 60 | 1 |
3 | 170 | 65 | 1 |
4 | 155 | 52 | 0 |
5 | 165 | 68 | 1 |
6 | 158 | 55 | 0 |
7 | 172 | 70 | 1 |
8 | 154 | 53 | 0 |
9 | 167 | 66 | 1 |
10 | 160 | 59 | 0 |
Features: Height, Weight and Gender
Target Variable: Gender
Step 1: Importing the necessary libraries
Import the essential libraries required to perform the desired functionalities.
import numpy as np from sklearn.model_selection import LeaveOneOut from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score
Step 2: Create the height-weight dataset using NumPy
dataset=np.array([ [150,50,0], [160,60,1], [170,65,1], [155,52,0], [165,68,1], [158,55,0], [172,70,1], [154,53,0], [167,66,1], [160,59,0] ])
Step 3: Create X and Y Variables
X and Y variables to store the feature and Target dataset respectively
X=dataset[:, :-1] Y=dataset[:,-1]
Step 4: Initialize the model and LOOCV
We are using LogisticRegression() model on the dataset for prediciton.
model=LogisticRegression() loo=LeaveOneOut()
Step 5: Create a list and perform LOOCV
A list has to be created to store accuracy scores for each iteration while performing LOOCV on the model.
accuracies=[]
Performing LOOCV:
for train_index, test_index in loo.split(X): X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index]
Fitting the model on training data and predicting on the test data:
model.fit(X_train, y_train) y_pred = model.predict(X_test)
Step 5: Calculating the accuracy and obtaining the final mean accuracy LOOCV score
accuracy = accuracy_score(y_test, y_pred) accuracies.append(accuracy) mean_accuracy = np.mean(accuracies) print(f'Mean accuracy across all iterations: {mean_accuracy:.2f}')
Final mean accuracy output:
Mean accuracy across all iterations: 0.80