Leave-One-Out Cross-Validation Using Python

Hey fellas!

Let us dive into today’s topic of Leave-One-Out Cross Validation used in Machine Learning to assess a model’s performance and provide reliable information to further improve the performance of the model.

Leave-One-Out Cross Validation or LOOCV is a type of k-fold cross-validation method where k is equal to the number of data points. We train our machine learning model n times (n is the size of the dataset). With every iteration change in n, one sample is used as a test set while the rest are used to train the model.

LeaveOneOut is imported from sklearn.model_selection.

Let me make it simpler by using an example,

Let the size of a dataset be 6, i.e., n=6
By using LOOCV,
Iteration n=1:
x1 is the test set and x2,x3,x4,x5 makes up the training set
model score=score 1

Iteration n=2:
x2 is the test set , x1 and x3,x4,x5 are the training sets
model score=score 2

this process continues till n times.

The final performance estimate is the average of the six individual scores:
Overall Score = score1+score2+score3+score4+score5+score6/6

We shall move to the next section to create a program and run it to illustrate and grasp the technical information about LOOCV discussed before.

PROGRAM WITH EXAMPLE DATASET:

We shall use a basic dataset such as “Heigh and Weight” to perform LOOCV.

Index	Height (in cm)	Weight (in kg)	Gender (0: Female, 1: Male)
1	150	50	0
2	160	60	1
3	170	65	1
4	155	52	0
5	165	68	1
6	158	55	0
7	172	70	1
8	154	53	0
9	167	66	1
10	160	59	0

Features: Height, Weight and Gender
Target Variable: Gender

Step 1: Importing the necessary libraries

Import the essential libraries required to perform the desired functionalities.

import numpy as np
from sklearn.model_selection import LeaveOneOut
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

Step 2: Create the height-weight dataset using NumPy

dataset=np.array([
    [150,50,0],
    [160,60,1],
    [170,65,1],
    [155,52,0],
    [165,68,1],
    [158,55,0],
    [172,70,1],
    [154,53,0],
    [167,66,1],
    [160,59,0]
])

Step 3: Create X and Y Variables

X and Y variables to store the feature and Target dataset respectively

X=dataset[:, :-1]
Y=dataset[:,-1]

Step 4: Initialize the model and LOOCV

We are using LogisticRegression() model on the dataset for prediciton.

model=LogisticRegression()
loo=LeaveOneOut()

Step 5: Create a list and perform LOOCV

A list has to be created to store accuracy scores for each iteration while performing LOOCV on the model.

accuracies=[]

Performing LOOCV:

for train_index, test_index in loo.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

Fitting the model on training data and predicting on the test data:

model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Step 5: Calculating the accuracy and obtaining the final mean accuracy LOOCV score

accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)


mean_accuracy = np.mean(accuracies)
print(f'Mean accuracy across all iterations: {mean_accuracy:.2f}')

Final mean accuracy output:

Mean accuracy across all iterations: 0.80