Machine Learning Model for Gender Classification Using Human Speech Data

Gender_Recognition.py

This model is developed using the Machine Learning classification algorithm Logistic Regression to classify the human gender based upon speech signal pre-processed data.

This project is developed in python language using a well-known classification Machine Learning algorithm called Logistic Regression. The model trained and tested on the Jupyter Notebook environment.

ABOUT THE DATASET

The dataset contains 20 data features and 1 target feature (data label)

data.info() command output represents all the data attributes along with Non-Null count and data type

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3168 entries, 0 to 3167
Data columns (total 21 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   meanfreq  3168 non-null   float64
 1   sd        3168 non-null   float64
 2   median    3168 non-null   float64
 3   Q25       3168 non-null   float64
 4   Q75       3168 non-null   float64
 5   IQR       3168 non-null   float64
 6   skew      3168 non-null   float64
 7   kurt      3168 non-null   float64
 8   sp.ent    3168 non-null   float64
 9   sfm       3168 non-null   float64
 10  mode      3168 non-null   float64
 11  centroid  3168 non-null   float64
 12  meanfun   3168 non-null   float64
 13  minfun    3168 non-null   float64
 14  maxfun    3168 non-null   float64
 15  meandom   3168 non-null   float64
 16  mindom    3168 non-null   float64
 17  maxdom    3168 non-null   float64
 18  dfrange   3168 non-null   float64
 19  modindx   3168 non-null   float64
 20  label     3168 non-null   object 
dtypes: float64(20), object(1)
memory usage: 519.9+ KB

INTRODUCTION

Initially, I have implemented all the necessary libraries required for reading data, pre-processing data, plotting the data matrices, and even splitting data into training and testing sets, etc. Following are the steps performed during Model training -

Step 1: Reading the data which is a .csv file (voice.csv) using the pandas library

Step 2: Checking data if there is any need for data pre-processing

Step 3: Performing data normalization task to reduce the processing cost

Step 4: Splitting the data in training and testing sets

Step 5: Training the model using the well-known classification algorithm Logistic Regression to get best model train

RESULTS

The final test accuracy of the model is 97.791% when writing the whole logistic regression code from scratch by making use of the sigmoid function as our logistic function. But, the test accuracy became 98.26% when we are importing logistic regression from sklearn.liner_model the accuracy got increased by all most 1%.

Coders Packet

Machine Learning Model for Gender Classification Using Human Speech Data

Comments