Analyzing the correlations among various features in a data set provides more insight and better understanding of the data set.
In the technical world, data is considered as the fuel for the proper working of any algorithm. That is why the need of understanding the data becomes more important.
In any data set, there can be various features present. They can be correlated with each other in different ways. There can be directly correlated relations, while some of them might be depending on each other inversely. Thus analyzing these correlations become very important while dealing with any machine learning algorithm.
There are several libraries and in-built methods developed over python language to generate these correlations. The inbuilt method is ' corr() '. This method is applied on the user data and takes any parameter as an argument which is considered as the standard parameter according to which the other features will be correlated. The obtained result constitutes values ranging from +1 to -1. The positive sign of the values indicates that the relation between the feature and the standard parameter is a direct relation, while the negative value shows the inverse dependency between the two.
Another correlation measure used is importing a scatter matrix. Scatter matrix establishes a matrix that will be representing how each feature of data set is related or affected by other features in the 2d graphical format. If there will be n features in the data set, then the scatter matrix will be containing n rows and n columns. Scatter matrix also provides the signs of values for different features to understand whether increasing or decreasing with a particular feature.
Submitted by KARTIK VASHIST (kartikvashist3)
Download packets of source code on Coders Packet
Comments