In this tutorial, we are going to learn about CHI-SQUARE FIT TEST to find correlation between variables in Python.
To determine that if two categorical variables are having a significant correlation between them we use the Chi-Square test. For example, we build a dataset and try to find a correlation between vegetarian food and non-vegetarian food with low-calorie and diabetic food. If a correlation is found then we can find out the food preferences of different people.
The general formula for this test: Square of the sum of observation frequencies-Square of the sum of expected frequencies/sum of Expected frequencies
We import chi-square from scipy.stats to directly use it in the code. There are other ways also to perform this function.
Observed frequencies in each category.
If the p-value is very small we should reject the hypothesis, the value of p less than 0.05 is statistically significant. It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct, as the p-value is less than 0.05, we do not retain the null-hypothesis and hence, the assumption is rejected.
Submitted by RISHAV RANJAN (rishav808)
Download packets of source code on Coders Packet
Comments