Machine Learning Model for Gender Classification Using Human Speech Data
This model is developed using the Machine Learning classification algorithm Logistic Regression to classify the human gender based upon speech signal pre-processed data.
This project is developed in python language using a well-known classification Machine Learning algorithm called Logistic Regression. The model trained and tested on the Jupyter Notebook environment.
ABOUT THE DATASET
The dataset contains 20 data features and 1 target feature (data label)
data.info() command output represents all the data attributes along with Non-Null count and data type
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3168 entries, 0 to 3167 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 meanfreq 3168 non-null float64 1 sd 3168 non-null float64 2 median 3168 non-null float64 3 Q25 3168 non-null float64 4 Q75 3168 non-null float64 5 IQR 3168 non-null float64 6 skew 3168 non-null float64 7 kurt 3168 non-null float64 8 sp.ent 3168 non-null float64 9 sfm 3168 non-null float64 10 mode 3168 non-null float64 11 centroid 3168 non-null float64 12 meanfun 3168 non-null float64 13 minfun 3168 non-null float64 14 maxfun 3168 non-null float64 15 meandom 3168 non-null float64 16 mindom 3168 non-null float64 17 maxdom 3168 non-null float64 18 dfrange 3168 non-null float64 19 modindx 3168 non-null float64 20 label 3168 non-null object dtypes: float64(20), object(1) memory usage: 519.9+ KB
INTRODUCTION
Initially, I have implemented all the necessary libraries required for reading data, pre-processing data, plotting the data matrices, and even splitting data into training and testing sets, etc. Following are the steps performed during Model training -
Step 1: Reading the data which is a .csv file (voice.csv) using the pandas library
Step 2: Checking data if there is any need for data pre-processing
Step 3: Performing data normalization task to reduce the processing cost
Step 4: Splitting the data in training and testing sets
Step 5: Training the model using the well-known classification algorithm Logistic Regression to get best model train
RESULTS
The final test accuracy of the model is 97.791% when writing the whole logistic regression code from scratch by making use of the sigmoid function as our logistic function. But, the test accuracy became 98.26% when we are importing logistic regression from sklearn.liner_model the accuracy got increased by all most 1%.
Project Files
| .. | ||
| This directory is empty. | ||