Build an emotion and gender classifier for human oration with adequate accuracy trained via RAVDESS dataset using librosa
RAVDESS is one of the most common dataset which you can easily find it on web. It's well liked because of its quality of speakers, recording and it has 24 actors of different genders.
The way the entire data was packaged for us, and the format of the audio filename, there's a few more parsing steps required for the RAVDESS. After parsing the package, we extracted the important features of an audio like mfcc ,mel and chroma ,which helps in uniquely defining one's voice.
At my early beginings embarking on this journey, I learnt through the hard way that male and female speakers have to be trained seperately or the model will struggle to get a good accuracy .So, we labeled the voicemaker's gender along with their voice emotions in order make the data more robust.
Here, comes the data-visualization part of voice-waveplot and their corresponding features like mel, mfcc and chroma in order to understand them in a depth so that we can feel the significance of those features.Also ,those features helped us in understand the basic difference between the voice of male and female.
Thereby, defining a function to extract those three features from an audio and implying it on all 1440 audio files which are stored in a matrix. After splitting the dataset and applying feature scaling, we trained it with three supervised machine learning algorithms classifier namely:Gradient Boost , MLP classifier and XGBoost out of which XGBoost gave the best accuracy which then we later used while defining the predictive function.
Here comes the best part in which, I tried to recorded my voice so that the model could predict it's emotion and gender, to which the model predicted the most expected and welcomed outcome thereby confirming it's excellence.
Submitted by Shikhar Mishra (brilliant902)
Download packets of source code on Coders Packet