Programming Language used-Python This is a data preprocessing model which reduces the number of features and saves your project from overfitting conditions.
Hello Guys, Today we are going to learn a simple and very necessary step in the Data Science project. So be with me and let's start our model.
I am going to use Python for this project.
What is PCA - PCA refers to Principal Component Analysis, this is a machine learning method that is used to reduce the number of features in the Dataset.
For building a Data Science project, preprocessing steps are a must follow and PCA is one of them, PCA ultimately reduces the chances of overfitting. More features are said to be a curse on your project and that's why I come up with this article to make you aware of this.
Ok, guys now I will start with the code...
Firstly go and have a look at the code which I have attached with this article so as to make it simpler to understand. I will explain to you the important codes where you could face problems
Step 1: Importing Libraries
import matplotlib.pyplot as plt import numpy as np import pandas as pd %matplotlib inline from sklearn.datasets import load_breast_cancer
Here I didn't download any dataset, the last line of the above code has an inbuilt dataset.
Step 4: Standard Normalization
from sklearn.preprocessing import StandardScaler normalsc=StandardScaler() normalsc.fit(mydata) scaler_data=normalsc.transform(mydata)
In this, I am scaling the features by using Standard Scaler. We can see there is a big difference in the features so if we reduce the features in this condition this will create a problem for us.
So we use standard scaler and make the units of all features on the same scale.
Step 5: PCA Operations
pca=PCA(n_components=3)
Here we will write the number of features we wanted to convert to. Like I wanted to convert 30 to 3 so I have written 3 in the code.
After this, I have fit them and transform them by using fit() and transform() functions.
See you have learned PCA, Now you can apply any algorithms such as Logistic Regression, Decision tree, KNN, etc on the new set of features.
So up to now, I think you get everything that you needed to. Watch the code properly and try learning more and more models. Learning is growing.
Thank You for reading this article and for your support.
Submitted by Shivesh Chaturvedi (shiveshchaturvedi)
Download packets of source code on Coders Packet
Comments