Prediction of covid-19 cases with Python in Machine learning
This Python machine learning project aimed to analyze the spread of local COVID-19 transmission in Maharashtra.
Taking a dataset for a specific period of time and then predicting the future cases through our dataset using Polynomial Features with Regression algorithm.
Firstly we just import some libraries like NumPy, pandas, seaborn and matplotlib through this we upload our prepared dataset and executed it.By using the
Polynomial features using regression algorithm we go further
#Upload the data set file in CSV format
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(
name=fn, length=len(uploaded[fn])))
import pandas as pd
import io
df = pd.read_csv("Covid-19.csv",encoding= 'unicode_escape')
days = df['Confirmed']
x = np.arange(len(days))
y = days.values
df.tail()
# we transform our data into a polynomial using the Polynomial feature function
from sklearn.preprocessing import PolynomialFeatures // import the polynomial feature libraries poly = PolynomialFeatures(degree=3) // define a variable (poly) taken a degree X = poly.fit_transform(x.reshape(-1,1)) pd.DataFrame(X)
| 0 | 1 | 2 | |
|---|---|---|---|
| 0 | 1.0 | 0.0 | 0.0 |
| 1 | 1.0 | 1.0 | 1.0 |
| 2 | 1.0 | 2.0 | 4.0 |
| 3 | 1.0 | 3.0 | 9.0 |
| 4 | 1.0 | 4.0 | 16.0 |
| ... | ... | ... | ... |
| 151 | 1.0 | 151.0 | 22801.0 |
| 152 | 1.0 | 152.0 | 23104.0 |
| 153 | 1.0 | 153.0 | 23409.0 |
| 154 | 1.0 | 154.0 | 23716.0 |
| 155 | 1.0 | 155.0 | 24025.0 |
156 rows × 3 columns
# use linear regression to fit the parameter
from sklearn.linear_model import LinearRegression reg = LinearRegression() reg.fit(X, y)
#plot of the graph as blue line gives number of cases and red line gives the polynomial data line. As the graph even represents the accuracy of the data and the degree to be taken.
from datetime import datetime Yp = reg.predict(X) plt.scatter(pd.date_range(start="2020-03-24",end="2020-08-26"),y) //the dataset is from 24 march 2020 to 26 August 2020 plt.plot(pd.date_range(start="2020-03-24",end="2020-08-26"), Yp, color='red') plt.show()
#predict the number of cases by assigning a value in numerical form for instance 159(27 Aug 2020)
reg.predict(poly.transform([[159]]))
output: The output may be slightly various in the form of decimals.
array([770915.57187574])
Project Files
/
Loading...
| .. | ||
| This directory is empty. | ||