Predicting the Wine Quality using ML in Python

Prediction of wine quality is a great way to practice end-to-end machine learning workflows, including data cleaning, model training, and optimization.

Steps involving to solve the question:

Step 1: Set Up Your Environment

First step is to ensure we have python and necessary libraries installed. Or we can install essential packages using:

pip install pandas numpy scikit-learn matplotlib seaborn

Step 2: Load the Dataset

The Wine Quality dataset is publicly available online. Download it and load it into a pandas DataFrame.

import pandas as pd

# Load the dataset
data = pd.read_csv("winequality.csv")

# Preview the first few rows
print(data.head())

Step 3: Understand and Prepare the Data

We need to explore the data and clean it if needed:

1.Check for missing values:

print(data.isnull().sum())  # Check for missing values
data.fillna(data.median(), inplace=True)  # Replace missing values with the median

2.Separate features and target:

X = data.drop("quality", axis=1)  # Features
y = data["quality"]  # Target

3.Standardize the features:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X = scaler.fit_transform(X)

Step 4: Split the Data

Divide the dataset into training and testing sets to evaluate the model’s performance.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 5: Choose and Train a Machine Learning Model

Starting with a simple model like Random Forest to classify the wine quality.

from sklearn.ensemble import RandomForestClassifier

# Initialize the model
model = RandomForestClassifier(random_state=42)

# Train the model
model.fit(X_train, y_train)

Step 6: Evaluate the Model

Then ,check how well your model is working using metrics like accuracy.

from sklearn.metrics import accuracy_score, classification_report

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate performance
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Step 7: Optimize the Model

In this step, we need to improve your model’s performance with hyperparameter tuning.

from sklearn.model_selection import GridSearchCV

# Define the parameter grid
param_grid = {
    "n_estimators": [100, 200, 300],
    "max_depth": [None, 10, 20],
    "min_samples_split": [2, 5, 10]
}

# Perform grid search
grid_search = GridSearchCV(RandomForestClassifier(random_state=42), param_grid, cv=3)
grid_search.fit(X_train, y_train)

print("Best Parameters:", grid_search.best_params_)

Step 8: Visualize Insights

Plot feature importance to understand which factors influence wine quality the most.

import matplotlib.pyplot as plt
import seaborn as sns

# Get feature importance
importances = model.feature_importances_
feature_names = data.columns[:-1]

# Plot the importance
sns.barplot(x=importances, y=feature_names)
plt.title("Feature Importance")
plt.show()

Step 9: Save the Trained Model

We can save the trained model so we can use it later without retraining.

import joblib

# Save the model
joblib.dump(model, "wine_quality_model.pkl")

Step 10: Load and Use the Model

Lastly, load the saved model and make predictions.

# Load the saved model
model = joblib.load("wine_quality_model.pkl")

# Test with a new sample (use data from your test set)
sample = X_test[0].reshape(1, -1)
predicted_quality = model.predict(sample)
print("Predicted Quality:", predicted_quality)

Conclusion:

We learnt from this tutorial that Wine quality prediction using machine learning demonstrates how data science can solve practical problems in the food and beverage industry. By analyzing chemical properties of wine and applying classification algorithms, we can accurately predict its quality.With proper preprocessing, model selection, and optimization, machine learning becomes a powerful tool for improving production processes and ensuring high-quality products.