Coders Packet

Handling Imbalanced Data using imbalance-learn in Python

By Viraj Nayak

An overview of different undersampling and oversampling methods in the imbalance-learn library for handling imbalanced data.

Since most machine learning algorithms assume balanced distributions, imbalanced datasets pose a challenge. If a class has fewer samples, these samples are most often tend to be misclassified. Balance Ratio is the ratio of the number of observations of the minority class to that of the majority class. The balance between classes can be achieved either by Under- or Over Sampling. 

In this project, we have an overview of different under- and oversampling techniques and their effect on the data. We also then implement them on two classification datasets to compare their effect on the model performance.

Jupyter Notebooks in the project:

  1. Various Undersampling and Oversampling Methods: This file contains an overview of different sampling methods and their effects on custom data.
  2. Implementation on Car Evaluation Dataset
  3. Implementation on Diabetes dataset

To install imbalanced-learning: pip install imbalanced-learn

Download Complete Code


No comments yet