Implimentation of Bagging Method on IRIS Dataset

This Jupyter Notebook illustrates a machine learning pipeline on the Iris data set. The pipeline consists of data loading, data preprocessing, model training, hyperparameter tuning and random forest classifier evaluation. Dataset Description The Iris dataset is one of the most classic datasets in machine learning, and it includes 150 samples of three species of Iris flower (Setosa, Versicolor, Virginica).

Each sample has four features: Sepal length Sepal width Petal length Petal width The purpose is to classify the species with respect to the above traits.

Load the Dataset: The dataset is imported using sklearn.datasets.load_iris().

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.metrics import accuracy_score, classification_report from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline from sklearn.ensemble import RandomForestClassifier

Data Splitting: The data is divided into training and test data with 80-20 ratio.

iris = load_iris()

X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Pipeline Creation & Standardization using StandardScaler and training a RandomForestClassifier in a pipeline:

pipeline = Pipeline([ ('scaler', StandardScaler()), ('classifier', RandomForestClassifier(random_state=42)) ])

param_grid = { 'classifier__n_estimators': [50, 100, 200], 'classifier__max_depth': [None, 10, 20, 30], 'classifier__min_samples_split': [5, 10], 'classifier__min_samples_leaf': [2, 4], 'classifier__bootstrap': [True, False], 'classifier__max_features': ['log2', 'sqrt'] }

Hyperparameter Tuning:- GridSearchCV is used to find the optimal hyperparameters:

grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, scoring='accuracy', cv=5, verbose=1)

grid_search.fit(X_train, y_train)

best_model = grid_search.best_estimator_

y_pred = best_model.predict(X_test)

Model Evaluation: Accuracy and classification report are also used for assessing model performance.

print("Accuracy:", accuracy_score(y_test, y_pred)) print("Classification Report:\n", classification_report(y_test, y_pred))

Related Posts

Leave a Comment Cancel Reply