Implimentation of Bagging Method on IRIS Dataset

This Jupyter Notebook illustrates a machine learning pipeline on the Iris data set. The pipeline consists of data loading, data preprocessing, model training, hyperparameter tuning and random forest classifier evaluation. Dataset Description The Iris dataset is one of the most classic datasets in machine learning, and it includes 150 samples of three species of Iris flower (Setosa, Versicolor, Virginica).

Each sample has four features: Sepal length Sepal width Petal length Petal width The purpose is to classify the species with respect to the above traits.

Load the Dataset: The dataset is imported using sklearn.datasets.load_iris().

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier

Data Splitting: The data is divided into training and test data with 80-20 ratio.

iris = load_iris()

X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Pipeline Creation & Standardization using StandardScaler and training a RandomForestClassifier in a pipeline:

 pipeline = Pipeline([
('scaler', StandardScaler()),
('classifier', RandomForestClassifier(random_state=42))
])

param_grid = {
'classifier__n_estimators': [50, 100, 200],
'classifier__max_depth': [None, 10, 20, 30],
'classifier__min_samples_split': [5, 10],
'classifier__min_samples_leaf': [2, 4],
'classifier__bootstrap': [True, False],
'classifier__max_features': ['log2', 'sqrt']
}

Hyperparameter Tuning:- GridSearchCV is used to find the optimal hyperparameters:

grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, scoring='accuracy', cv=5, verbose=1)

grid_search.fit(X_train, y_train)

best_model = grid_search.best_estimator_

y_pred = best_model.predict(X_test)

Model Evaluation: Accuracy and classification report are also used for assessing model performance.

print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top