Coders Packet

Data explanation using Visualization in Python

By RISHAV RANJAN

In this tutorial we will learn about different different visualization tools used during data explanation in Python.

Data Visualization

It represents data in graphical format. When the data is numerous it is an efficient way of communicating, for example, a Time Series. Data visualization is considered a branch of Descriptive Statistics.


In this tutorial we are going to deal with some basic data visualization tools:
a)Bar Plot
b)Histogram
c)Distribution Plot
d)Boxplot
e)Pairplot
f)Correlation and Heatmap

Barplot

A bar plot represents the category of data with rectangular bars. Its length and height is proportional to the values which they represent. It can be plotted horizontally or vertically. It describes the comparisons between the discrete categories.

Step 1: Import necessary Libraries 

Step 2: Import Data

Step 3: Code for Barplot

Histogram

It is an approximate representation of numerical data distribution. It is used for continuous data.

CODE:

Distribution plot

It is most suitable for numerical data to compare ranges and distribution for groups. It visualizes the distribution of data. The distribution plot isn't relevant for an in-depth analysis of the info because it deals with a summary of the info distribution.

CODE:

Boxplot

It is a method for  depicting numerical data graphically through their quartiles. It has lines extending from the boxes (whiskers) which indicates variability outside the upper and lower quartiles. Outliers are plotted as individual points. It is non-parametric: without making any assumptions of the underlying statistical distribution they display variation in samples of a statistical population. The spacings between the different parts of the box is the indication of the degree of dispersion (spread) and skewness in the data.

CODE:

Pairplot

In a dataset to plot multiple pairwise bivariate distributions pair plot() function is used.  

CODE:

Correlation heatmap

It is a 2D heatmap that shows a correlation matrix between two discrete dimensions. The first dimension values appear as the row while the second dimension is a column.

CODE:

 

 

Download Complete Code

Comments

No comments yet