Coders Packet



In this tutorial, we are going to learn about CHI-SQUARE FIT TEST to find correlation between variables in Python.

Chi-Square test

To determine that if two categorical variables are having a significant correlation between them we use the Chi-Square test. For example, we build a dataset and try to find a correlation between vegetarian food and non-vegetarian food with low-calorie and diabetic food. If a correlation is found then we can find out the food preferences of different people.
The general formula for this test: Square of the sum of observation frequencies-Square of the sum of expected frequencies/sum of Expected frequencies

STEP 1: Import Libraries

We import chi-square from scipy.stats to directly use it in the code. There are other ways also to perform this function.

STEP 2: Initialize observed frequencies and then expected frequencies with the percentage.


f_obs: array_like

Observed frequencies in each category.

f_exp: array_like, optional


STEP 3: Perform Chi-square test

If the p-value is very small we should reject the hypothesis, the value of p less than 0.05 is statistically significant. It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct, as the p-value is less than 0.05, we do not retain the null-hypothesis and hence, the assumption is rejected.

Download Complete Code


No comments yet