Coders Packet

Prediction of covid-19 cases with Python in Machine learning

By Kondreddy Sujith

This Python machine learning project aimed to analyze the spread of local COVID-19 transmission in Maharashtra.

Taking a dataset for a specific period of time and then predicting the future cases through our dataset using Polynomial Features with Regression algorithm.

 

Firstly we just import some libraries like NumPy, pandas, seaborn and matplotlib through this we upload our prepared dataset and executed it.By using the

Polynomial features using regression algorithm we go further 

#Upload the data set file in CSV format 

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import files
 
uploaded = files.upload()
 
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))


import pandas as pd
import io
df = pd.read_csv("Covid-19.csv",encoding= 'unicode_escape')
days = df['Confirmed']
x = np.arange(len(days))
y = days.values
 
df.tail()

 

 

# we transform our data into a polynomial using the Polynomial feature function
 
from sklearn.preprocessing import PolynomialFeatures   // import the polynomial feature libraries
poly = PolynomialFeatures(degree=3)                          // define a variable (poly) taken a degree
X = poly.fit_transform(x.reshape(-1,1))                       
pd.DataFrame(X)
0 1 2
0 1.0 0.0 0.0
1 1.0 1.0 1.0
2 1.0 2.0 4.0
3 1.0 3.0 9.0
4 1.0 4.0 16.0
... ... ... ...
151 1.0 151.0 22801.0
152 1.0 152.0 23104.0
153 1.0 153.0 23409.0
154 1.0 154.0 23716.0
155 1.0 155.0 24025.0

156 rows × 3 columns

 
# use linear regression to fit the parameter
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(X, y)

 

#plot of the graph as blue line gives number of cases and red line gives the polynomial data line. As the graph even represents the accuracy of the data and the degree to be taken.
from datetime import datetime
Yp = reg.predict(X)
plt.scatter(pd.date_range(start="2020-03-24",end="2020-08-26"),y)            //the dataset is from 24 march 2020 to 26 August 2020
plt.plot(pd.date_range(start="2020-03-24",end="2020-08-26"), Yp, color='red')
plt.show()
 
 
#predict the number of cases by assigning a value in numerical form for instance 159(27 Aug 2020)
reg.predict(poly.transform([[159]]))

output: The output may be slightly various in the form of decimals.

array([770915.57187574])

 

Download Complete Code

Comments

No comments yet