Coders Packet

Machine Learning Model for Gender Classification Using Human Speech Data

By Mukul Kumar

This model is developed using the Machine Learning classification algorithm Logistic Regression to classify the human gender based upon speech signal pre-processed data.

This project is developed in python language using a well-known classification Machine Learning algorithm called Logistic Regression. The model trained and tested on the Jupyter Notebook environment.


The dataset contains 20 data features and 1 target feature (data label) command output represents all the data attributes along with Non-Null count and data type

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3168 entries, 0 to 3167
Data columns (total 21 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   meanfreq  3168 non-null   float64
 1   sd        3168 non-null   float64
 2   median    3168 non-null   float64
 3   Q25       3168 non-null   float64
 4   Q75       3168 non-null   float64
 5   IQR       3168 non-null   float64
 6   skew      3168 non-null   float64
 7   kurt      3168 non-null   float64
 8   sp.ent    3168 non-null   float64
 9   sfm       3168 non-null   float64
 10  mode      3168 non-null   float64
 11  centroid  3168 non-null   float64
 12  meanfun   3168 non-null   float64
 13  minfun    3168 non-null   float64
 14  maxfun    3168 non-null   float64
 15  meandom   3168 non-null   float64
 16  mindom    3168 non-null   float64
 17  maxdom    3168 non-null   float64
 18  dfrange   3168 non-null   float64
 19  modindx   3168 non-null   float64
 20  label     3168 non-null   object 
dtypes: float64(20), object(1)
memory usage: 519.9+ KB


Initially, I have implemented all the necessary libraries required for reading data, pre-processing data, plotting the data matrices, and even splitting data into training and testing sets, etc. Following are the steps performed during Model training -

Step 1: Reading the data which is a .csv file (voice.csv) using the pandas library

Step 2: Checking data if there is any need for data pre-processing

Step 3: Performing data normalization task to reduce the processing cost

Step 4: Splitting the data in training and testing sets

Step 5: Training the model using the well-known classification algorithm Logistic Regression to get best model train


The final test accuracy of the model is 97.791% when writing the whole logistic regression code from scratch by making use of the sigmoid function as our logistic function. But, the test accuracy became 98.26% when we are importing logistic regression from sklearn.liner_model the accuracy got increased by all most 1%.





Download project

Reviews Report

Submitted by Mukul Kumar (mukul102000)

Download packets of source code on Coders Packet