Leave-One-Out Cross-Validation Using Python

Hey fellas!

Let us dive into today’s topic of Leave-One-Out Cross Validation used in Machine Learning to assess a model’s performance and provide reliable information to further improve the performance of the model.

Leave-One-Out Cross Validation or LOOCV is a type of k-fold cross-validation method where k is equal to the number of data points. We train our machine learning model times (n is the size of the dataset). With every iteration change in n, one sample is used as a test set while the rest are used to train the model.

LeaveOneOut is imported from sklearn.model_selection.

Let me make it simpler by using an example,

Let the size of a dataset be 6, i.e., n=6
By using LOOCV,
Iteration n=1:
x1 is the test set and x2,x3,x4,x5 makes up the training set
model score=score 1

Iteration n=2:
x2 is the test set , x1 and x3,x4,x5 are the training sets
model score=score 2

this process continues till n times.

The final performance estimate is the average of the six individual scores:
Overall Score = score1+score2+score3+score4+score5+score6/6

We shall move to the next section to create a program and run it to illustrate and grasp the technical information about LOOCV discussed before.

PROGRAM WITH EXAMPLE DATASET:

We shall use a basic dataset such as “Heigh and Weight” to perform LOOCV.

Index Height (in cm) Weight (in kg) Gender (0: Female, 1: Male)
1 150 50 0
2 160 60 1
3 170 65 1
4 155 52 0
5 165 68 1
6 158 55 0
7 172 70 1
8 154 53 0
9 167 66 1
10 160 59 0

Features: Height, Weight and Gender
Target Variable: Gender

Step 1: Importing the necessary libraries

Import the essential libraries required to perform the desired functionalities.

import numpy as np
from sklearn.model_selection import LeaveOneOut
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

Step 2: Create the height-weight dataset using NumPy

dataset=np.array([
    [150,50,0],
    [160,60,1],
    [170,65,1],
    [155,52,0],
    [165,68,1],
    [158,55,0],
    [172,70,1],
    [154,53,0],
    [167,66,1],
    [160,59,0]
])

Step 3: Create X and Y Variables

X and Y variables to store the feature and Target dataset respectively

X=dataset[:, :-1]
Y=dataset[:,-1]

Step 4: Initialize the model and LOOCV

We are using LogisticRegression() model on the dataset for prediciton.

model=LogisticRegression()
loo=LeaveOneOut()

Step 5: Create a list and perform LOOCV

A list has to be created to store accuracy scores for each iteration while performing LOOCV on the model.

accuracies=[]

Performing LOOCV:

for train_index, test_index in loo.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

Fitting the model on training data and predicting on the test data:

model.fit(X_train, y_train)
y_pred = model.predict(X_test)

 

Step 5: Calculating the accuracy and obtaining the final mean accuracy LOOCV score

 

accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)


mean_accuracy = np.mean(accuracies)
print(f'Mean accuracy across all iterations: {mean_accuracy:.2f}')

 

Final mean accuracy output:

Mean accuracy across all iterations: 0.80

 

 

 

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top