Coders Packet

Preprocessing of Data using PCA(Principal Component Analysis) with Python

By Shivesh Chaturvedi

Programming Language used-Python This is a data preprocessing model which reduces the number of features and saves your project from overfitting conditions.

                            Preprocessing of Data using PCA

 

Hello Guys, Today we are going to learn a simple and very necessary step in the Data Science project. So be with me and let's start our model.
I am going to use Python for this project.

PCA 

What is PCA - PCA refers to Principal Component Analysis, this is a machine learning method that is used to reduce the number of features in the Dataset.

For building a Data Science project, preprocessing steps are a must follow and PCA is one of them, PCA ultimately reduces the chances of overfitting. More features are said to be a curse on your project and that's why I come up with this article to make you aware of this.

Ok, guys now I will start with the code...

Firstly go and have a look at the code which I have attached with this article so as to make it simpler to understand. I will explain to you the important codes where you could face problems

Step 1: Importing Libraries

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
from sklearn.datasets import load_breast_cancer

Here I didn't download any dataset, the last line of the above code has an inbuilt dataset.

Step 4: Standard Normalization

from sklearn.preprocessing import StandardScaler 
normalsc=StandardScaler()
normalsc.fit(mydata) 
scaler_data=normalsc.transform(mydata)

 

In this, I am scaling the features by using Standard Scaler. We can see there is a big difference in the features so if we reduce the features in this condition this will create a problem for us.

So we use standard scaler and make the units of all features on the same scale.

Step 5: PCA Operations

pca=PCA(n_components=3)

Here we will write the number of features we wanted to convert to. Like I wanted to convert 30 to 3 so I have written 3 in the code.

After this, I have fit them and transform them by using fit() and transform() functions.

See you have learned PCA, Now you can apply any algorithms such as Logistic Regression, Decision tree, KNN, etc on the new set of features.

So up to now, I think you get everything that you needed to. Watch the code properly and try learning more and more models. Learning is growing.

Thank You for reading this article and for your support.

Download Complete Code

Comments

No comments yet