World Cup Cricket Prediction Machine Learning Model

This Jupyter Notebook illustrates a machine learning pipeline for analyzing World Cup match results. The pipeline consists of data loading, data preprocessing, feature selection, model training, hyperparameter tuning, and evaluation of a Random Forest classifier.

Dataset Description

The analysis utilizes two datasets:

  1. Match Results: Sourced from a CSV file named results.csv, which contains historical match data, including:

    • Date of the match
    • Teams involved (Team_1 and Team_2)
    • Winner of the match
    • Margin of victory
    • Venue (Ground)
  2. Team Statistics: Obtained from another CSV file named World_cup_2023.csv, which includes:

    • Team names
    • Team rankings
    • Number of titles won
    • Win percentages in One Day Internationals (ODI)
    • World Cup match statistics (matches played, won, lost, etc.)
    • Recent points and ratings

Load the Dataset

The datasets are imported using Pandas:

import pandas as pd 
match_results = pd.read_csv('results.csv') 
team_statistics = pd.read_csv('World_cup_2023.csv')

Feature Selection

The following features are selected for the machine learning model:

  • Encoded team identifiers for both teams
  • Team rankings
  • Win percentages in ODIs and World Cups
  • Recent points and ratings

The target variable is the encoded winner of the match.

Model Training and Evaluation

Train-Test Split

The dataset is split into training and testing sets to evaluate the model’s performance

Machine Learning Pipeline

A machine learning pipeline is created using Scikit-learn, which includes:

  • StandardScaler for feature scaling
  • RandomForestClassifier for classification
from sklearn.pipeline import Pipeline 
from sklearn.ensemble import RandomForestClassifier 
from sklearn.preprocessing import StandardScaler  
pipeline = Pipeline([     ('scaler', StandardScaler()),     ('classifier', RandomForestClassifier(random_state=42)) ])

Hyperparameter Tuning

GridSearchCV is used to find the optimal hyperparameters for the Random Forest model. The parameters tuned include:

from sklearn.model_selection import GridSearchCV 
param_grid = {'classifier__n_estimators': [50, 100, 200], 'classifier__max_depth': [None, 10, 20, 30], 'classifier__min_samples_split': [5, 10],     'classifier__min_samples_leaf': [2, 4],'classifier__bootstrap': [True, False], 'classifier__max_features': ['log2', 'sqrt'] }
grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, scoring='accuracy', cv=5, verbose=1) grid_search.fit(X_train, y_train) 
best_model = grid_search.best_estimator_

Model Fitting

The model is fitted to the training data, and predictions are made on the test set:

y_pred = best_model.predict(X_test)

Model Evaluation

Accuracy and classification report are used for assessing model performance:

from sklearn.metrics import accuracy_score, classification_report
2
3print("Accuracy:", accuracy_score(y_test, y_pred))
4print("Classification Report:\n", classification_report(y_test, y_pred))

This structured documentation provides a clear and concise overview of the analysis of World Cup match results using machine learning, detailing each step of the process from data loading to model evaluation.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top