Overview:
When we start a Machine Learning model, not all the features (columns) carry the same scale for example one feature might be in meters while the other is in kilometer. This creates an imbalance and confuses for the algorithms such as SVM and KNN. So, to handle these features we can use the Feature Scaling Technique.
Feature Scaling is a technique to adjust the feature values based on the two popular methods such as Standardization and Normalization.
Standardization:
It transforms the data to have a mean 0 and the standard deviation of 1. It is more powerful when the model needs “Centered” and Spread data.
Formula:
z=x−μ/σ
- x is the original value
- μ is the mean of the feature
- σ is the standard deviation
When to use the standardization method?
- When features are normally distributed
- When outliers are present
- When algorithms assumes that the data is Centered such as PCA, Linear Regression
Normalization:
It is the process of rescaling the feature values so, that they fall between 0 and 1. It is better for the uniform scaling and faster for the large datasets.
Formula:
z(scaled)=x-x(min)/x(max)-x(min)
- x is the original value
- x-min and x-max are the minimum and maximum values of the feature
When to use the normalization method?
- When you know the dataset is normally distributed
- When you are using the distance-based algorithms like KNN (or) Neural Networks.
Differences between Standardization Vs Normalization:
Aspect Standardization Normalization
Output Range No fixed range [0,1]
Large Dataset Not handle properly Handle properly
Handle Outliers Better Poot
Formula mean and standard deviation-based min-max based
Example Exam score comparison Smart phone battery health app
Conclusion:
Based on the above blog I have concluded that the feature scaling methods are more helpful for balancing the features in the model. So that the model produces the better accurate results.