Feature Scaling: Standardization vs Normalization

Overview:

When we start a Machine Learning model, not all the features (columns) carry the same scale for example one feature might be in meters while the other is in kilometer. This creates an imbalance and confuses for the algorithms such as SVM and KNN. So, to handle these features we can use the Feature Scaling Technique.

Feature Scaling is a technique to adjust the feature values based on the two popular methods such as Standardization and Normalization.

Standardization:

It transforms the data to have a mean 0 and the standard deviation of 1. It is more powerful when the model needs “Centered” and Spread data.

Formula:

z=xμ/σ

  • x is the original value
  • μ is the mean of the feature
  • σ is the standard deviation

When to use the standardization method?

  1. When features are normally distributed
  2. When outliers are present
  3. When algorithms assumes that the data is Centered such as PCA, Linear Regression

Normalization:

It is the process of rescaling the feature values so, that they fall between 0 and 1. It is better for the uniform scaling and faster for the large datasets.

Formula:

z(scaled)=x-x(min)/x(max)-x(min)

  • x is the original value
  • x-min and x-max are the minimum and maximum values of the feature

When to use the normalization method?

  1. When you know the dataset is normally distributed
  2. When you are using the distance-based algorithms like KNN (or) Neural Networks.

Differences between Standardization Vs Normalization:

Aspect                                          Standardization                                                                Normalization

Output Range                               No fixed range                                                                         [0,1]

Large Dataset                                Not handle properly                                                              Handle properly

Handle Outliers                             Better                                                                                        Poot

Formula                                          mean and standard deviation-based                                  min-max based

Example                                         Exam score comparison                                                         Smart phone battery health app

Conclusion:

Based on the above blog I have concluded that the feature scaling methods are more helpful for balancing the features in the model. So that the model produces the better accurate results.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top