By Kiran Reddy
In this project, we are going to learn how to 1) Web scrape COVID 19 data of India 2) Visualize it in a Bubble Map
For this project, we will be scraping the Wikipedia page that contains the COVID 19 state-wise data of India.
Before Scraping let us import the required packages
import pandas as pd # for getting India map import folium
after importing the packages, read the CSV file containing the latitudes and longitudes of Indian states.
coordinates = pd.read_csv("latandlon.csv")
now we will scrape the COVID 19 data
# Retreving the LIVE COVID19 Stats from Wikipedia coronastats = pd.read_html('https://en.wikipedia.org/wiki/COVID-19_pandemic_in_India#covid19-container',match='State/Union Territory') # Convert to DataFrame covid19 = pd.DataFrame(coronastats[0])
# cleaning Covid19 Dataframe # Removing unnecessary rows at the tail and Unnecessary columns covid19 = covid19.iloc[:-2,:-4] # Renaming Attribute Names for simplicity covid19.columns = ['State','Total cases','Deaths','Recoveries','Active cases'] # covid19.head() covid19
the cleaned COVID 19 data looks like this
State | Total cases | Deaths | Recoveries | Active cases | |
---|---|---|---|---|---|
0 | Andaman and Nicobar Islands | 4126 | 56 | 3892 | 178 |
1 | Andhra Pradesh | 786050 | 6453 | 744532 | 35065 |
2 | Arunachal Pradesh | 13643 | 30 | 10780 | 2833 |
3 | Assam | 201,407[b] | 875 | 173213 | 27319 |
4 | Bihar | 205945 | 1003 | 194005 | 10937 |
now we will merge/join the two data frames, one is the coordinates data frame and the other is the covid19 data frame
# joining/merging the two data frames covid = covid19.join(coordinates.set_index('State'), on = 'State') covid
after joining the two data frames we will store it in a covid data frame
the resultant data frame looks like this
State | Total cases | Deaths | Recoveries | Active cases | Latitude | Longitude | |
---|---|---|---|---|---|---|---|
0 | Andaman and Nicobar Islands | 4126 | 56 | 3892.0 | 178 | 11.667026 | 92.735983 |
1 | Andhra Pradesh | 786050 | 6453 | 744532.0 | 35065 | 17.686800 | 83.218500 |
2 | Arunachal Pradesh | 13643 | 30 | 10780.0 | 2833 | 27.100399 | 93.616601 |
3 | Assam | 201,407[b] | 875 | 173213.0 | 27319 | 26.749981 | 94.216667 |
4 | Bihar | 205945 | 1003 | 194005.0 | 10937 | 25.785414 | 87.479973 |
now, before plotting the bubble map make sure that the column you are willing to plot should be of the float data type
for example, I'm plotting the recoveries of each state on the map
I'll be converting all entries in the recoveries column into the float datatype
covid19['Recoveries']=covid19.Recoveries.astype(float)
Initializing the map m1
# Make an empty map m1 = folium.Map(location=[20.5937,78.9629], zoom_start=5)
convert each column data frame into the list
state = list(covid['State']) latitude = list(covid['Latitude']) longitude = list(covid['Longitude']) total = list(covid['Total cases']) deaths = list(covid['Deaths']) recovery = list(covid['Recoveries']) active = list(covid['Active cases'])
for s, lat, long, t, d, r, a in zip(state,latitude ,longitude ,total , deaths, recovery, active): folium.Circle( location=[lat, long], popup=folium.Popup(('State : ' + s + '
' + 'Total Cases : ' + str(t) + '
' + 'Deaths : ' + str(d) + '
' + 'Recoveries : ' + str(r) + '
' + 'Active Cases : ' + str(a) + ''), max_width=200), radius=r * 0.2, color='green', fill=True, fill_color='green' ).add_to(m1)
since we are plotting recoveries of each state I'm using the Green color for Bubbles
you can save the plot in an HTML file
# Save it as html m1.save('mymap.html')
Similarly, you can plot the Bubble Map for Deaths and Total Cases in India.
Try Hands-on this Data visualization
Happy Learning
Submitted by Kiran Reddy (kirankumarreddy)
Download packets of source code on Coders Packet
Comments