This Python machine learning project aimed to analyze the spread of local COVID-19 transmission in Maharashtra.

Taking a dataset for a specific period of time and then predicting the future cases through our dataset using Polynomial Features with Regression algorithm.

Firstly we just import some libraries like NumPy, pandas, seaborn and matplotlib through this we upload our prepared dataset and executed it.By using the

Polynomial features using regression algorithm we go further

#Upload the data set file in CSV format

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from google.colab import files uploaded = files.upload() for fn in uploaded.keys(): print('User uploaded file "{name}" with length {length} bytes'.format( name=fn, length=len(uploaded[fn]))) import pandas as pd import io df = pd.read_csv("Covid-19.csv",encoding= 'unicode_escape') days = df['Confirmed'] x = np.arange(len(days)) y = days.values df.tail()

# we transform our data into a polynomial using the Polynomial feature function

from sklearn.preprocessing import PolynomialFeatures // import the polynomial feature libraries poly = PolynomialFeatures(degree=3) // define a variable (poly) taken a degree X = poly.fit_transform(x.reshape(-1,1)) pd.DataFrame(X)

0 | 1 | 2 | |
---|---|---|---|

0 | 1.0 | 0.0 | 0.0 |

1 | 1.0 | 1.0 | 1.0 |

2 | 1.0 | 2.0 | 4.0 |

3 | 1.0 | 3.0 | 9.0 |

4 | 1.0 | 4.0 | 16.0 |

... | ... | ... | ... |

151 | 1.0 | 151.0 | 22801.0 |

152 | 1.0 | 152.0 | 23104.0 |

153 | 1.0 | 153.0 | 23409.0 |

154 | 1.0 | 154.0 | 23716.0 |

155 | 1.0 | 155.0 | 24025.0 |

156 rows × 3 columns

# use linear regression to fit the parameter

from sklearn.linear_model import LinearRegression reg = LinearRegression() reg.fit(X, y)

#plot of the graph as blue line gives number of cases and red line gives the polynomial data line. As the graph even represents the accuracy of the data and the degree to be taken.

from datetime import datetime Yp = reg.predict(X) plt.scatter(pd.date_range(start="2020-03-24",end="2020-08-26"),y) //the dataset is from 24 march 2020 to 26 August 2020 plt.plot(pd.date_range(start="2020-03-24",end="2020-08-26"), Yp, color='red') plt.show()

#predict the number of cases by assigning a value in numerical form for instance 159(27 Aug 2020)

reg.predict(poly.transform([[159]]))

output: The output may be slightly various in the form of decimals.

array([770915.57187574])

Submitted by Kondreddy Sujith (sujith)

Download packets of source code on Coders Packet

## Comments