By Mukul Kumar
This model is developed using the Machine Learning classification algorithm Logistic Regression to classify the human gender based upon speech signal pre-processed data.
This project is developed in python language using a well-known classification Machine Learning algorithm called Logistic Regression. The model trained and tested on the Jupyter Notebook environment.
ABOUT THE DATASET
The dataset contains 20 data features and 1 target feature (data label)
data.info() command output represents all the data attributes along with Non-Null count and data type
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3168 entries, 0 to 3167 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 meanfreq 3168 non-null float64 1 sd 3168 non-null float64 2 median 3168 non-null float64 3 Q25 3168 non-null float64 4 Q75 3168 non-null float64 5 IQR 3168 non-null float64 6 skew 3168 non-null float64 7 kurt 3168 non-null float64 8 sp.ent 3168 non-null float64 9 sfm 3168 non-null float64 10 mode 3168 non-null float64 11 centroid 3168 non-null float64 12 meanfun 3168 non-null float64 13 minfun 3168 non-null float64 14 maxfun 3168 non-null float64 15 meandom 3168 non-null float64 16 mindom 3168 non-null float64 17 maxdom 3168 non-null float64 18 dfrange 3168 non-null float64 19 modindx 3168 non-null float64 20 label 3168 non-null object dtypes: float64(20), object(1) memory usage: 519.9+ KB
Initially, I have implemented all the necessary libraries required for reading data, pre-processing data, plotting the data matrices, and even splitting data into training and testing sets, etc. Following are the steps performed during Model training -
Step 1: Reading the data which is a .csv file (voice.csv) using the pandas library
Step 2: Checking data if there is any need for data pre-processing
Step 3: Performing data normalization task to reduce the processing cost
Step 4: Splitting the data in training and testing sets
Step 5: Training the model using the well-known classification algorithm Logistic Regression to get best model train
The final test accuracy of the model is 97.791% when writing the whole logistic regression code from scratch by making use of the sigmoid function as our logistic function. But, the test accuracy became 98.26% when we are importing logistic regression from sklearn.liner_model the accuracy got increased by all most 1%.