Implementation of Linear regression algorithm with gradient descent function using C++

In this program, we are going to predict results from real-life data using the linear regression algorithm. Here C++ programming language is used.

This program can be used to understand the Linear regression algorithm with gradient descent function and to predict the output from real-life data like,

1)In computational biology, predicting cancer to be fatal or not based on the size of the tumor

2)Predicting the climate based on the leaf color change and humidity of the air.

3)Predicting the safety of the person in a car based on the function of airbag inflation during an accident. etc., are examples of machine learning. There are two types of Machine learning. They are supervised and unsupervised learning. Linear regression comes under supervised learning. It is one of the important algorithms in Machine learning.

Linear regression:

The algorithm which uses the Linear function Hθ(x)=θ01*X to predict the output for continuously varying inputs is called Linear regression. where Hθ(X) is the predicted output, y, and θ0, θ1 are the parameters to stabilize the function

For understanding let's say we are creating a linear function using past datasets. By using this function we are predicting the future output. To find the linear function we need data sets also called training sets(x,y). Here we are going to use only one variable i.e, y varies with x only. This program is applicable only to continuous-valued ' n 'datasets, where n=no of datasets.

More precisely this program runs well for the datasets having differences in decimals or single digits. An example is shown below for understanding. It cannot be applied for discrete values. Another important point is that we should carefully choose the theta 0 and theta 1 values since we have to assume them.

A real-life example of humidity change due to temperature in an area is shown. we are having n=9 no of datasets. For this function, we can assume theta 0 as 0 and theta 1=0.193, Or you can try y/x(from a single dataset).

 Temperature in (deg c) Humidity 9.47 0.89 9.35 0.86 9.37 0.89 8.25 0.83 8.75 0.83 9.22 0.85 7.33 0.95 8.77 0.89 10.82 0.82

In the program, the x and y values are taken in an array x[] and y[]. And the condition in the code varies depends on the theta values. In the below code, the theta value is optimized until the error is less than two. So, the condition should be used depending on the initial j value(cost function).

```if ((s<=0)&&(t<2))         /*To check if the cost function is lesser */
{
goto result;
}
else
{
for (i = 0; i < m; i++)   /*To calculate theta 0 and theta 1 values to minimize cost function*/
{
e = e + ((a + (b * x[i])) - y[i]);   f = f + (((a + (b * x[i])) - y[i]) * x[i]);
}}```

In linear regression, we have to estimate the cost function as shown below. It is the squared error function.

The code uses the above function in for loop to find the j0 and j1 values. The objective of the algorithm is to minimize the cost function by varying theta 0 and theta 1, to get a good linear function. So that the predicted result is validated. The algorithm which is used to find the theta values automatically to get a good straight-line fit i.e., without an error, for the linear equation is called gradient descent. So the combination of Linear function with gradient descent is called Linear regression. The formula and code for gradient descent are shown below.

The theta values are checked continuously for optimization. It should minimize the cost function, j(θ0,θ1).

```for (i = 0; i < m; i++)   /*To calculate theta 0 and theta 1 values to minimize cost function*/
{
if (s!=0) {
e = e + ((a + (b * x[i])) - y[i]);
}
if (t!=0)
{
f = f + (((a + (b * x[i])) - y[i]) * x[i]);
}}
r = (alpha / m);
temp0 = a - (r * e);
temp1 = b - (r * f);
a = temp0;
b = temp1;
}```

When the goal is achieved, the linear function is obtained. By using that we can predict the output value for various inputs.

The test run is shown below,