Random forest algorithm
1. It is used in machine learning algorithm
2. It combines the output of multiple decision trees to reach a single result
3. It’s handles both Classification and regression problems in a data
Python program to data set using random forest algorithm
Here I am taking the data set in a xlsx sheet. I am applying the random forest algorithm on that data.
Python program
# Importing the libraries import numpy as np # for array operations import pandas as pd # for working with DataFrames import requests, io # for HTTP requests and I/O commands import matplotlib.pyplot as plt # for data visualization %matplotlib inline # scikit-learn modules from sklearn.model_selection import train_test_split # for splitting the data from sklearn.metrics import mean_squared_error # for calculating the cost function from sklearn.ensemble import RandomForestRegressor # for building the model # Importing the dataset from the url of the data set url = "https://drive.google.com/u/0/uc?id=1mVmGNx6cbfvRHC_DvF12ZL3wGLSHD9f_&export=download" data = requests.get(url).content # Reading the data dataset = pd.read_csv(io.StringIO(data.decode('utf-8'))) dataset.head() x = dataset.drop('Petrol_Consumption', axis = 1) # Features y = dataset['Petrol_Consumption'] # Target # Splitting the dataset into training and testing set (80/20) x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 28) # Initializing the Random Forest Regression model with 10 decision trees model = RandomForestRegressor(n_estimators = 10, random_state = 0) # Fitting the Random Forest Regression model to the data model.fit(x_train, y_train) # Predicting the target values of the test set y_pred = model.predict(x_test) # RMSE (Root Mean Square Error) rmse = float(format(np.sqrt(mean_squared_error(y_test, y_pred)),'.3f')) print("\nRMSE:\n",rmse)
Output :-
Root Mean Square Error (RMSE):96.389
Conclusion:-
This RMSE value gives you an idea of how well the model is performing. The lower the RMSE, the better the model’s predictions are in terms of matching the actual values.