We create a Convolutional neural network for trigger word detections used by popular voice assistants. On hearing the activate keyword it makes a chime sound.
Trigger word detection in Python Tensorflow and Keras
We create a Convolutional neural network for trigger word detections used by popular voice assistants. On hearing the activate keyword it makes a chime sound.
About the datasets:
The dataset contains 3 different types of sounds are activated, negatives, and background noise. We create the dataset to train on by randomly adding the activates and negatives to background noise. Each of the sounds added to the background noise is added in the background, this does not change the length of the recording. The background noises are of max 10s.
Preprocessing:
The training set is created with an array with the corresponding values of y. We also standardize the audio clips to match the target amplitude.
Model:
The model we use is Conv1D -> BatchNormalization -> Activation -> Dropout -> [Gru -> Dropout -> Batch Normalization]x3 -> Dropout
It is a 14 layer neural network.
Layers
1) Conv1D layer: This is a 1D Convolutional layer, in which a filter of n dimension is used. The input and filter are both unidimensional.
2) BatchNormalization layer: This is a layer that is used to normalize the inputs and reduce the overfitting of data. It is done by calculating the mean and variance.
3) Activation: This layer is a function that is added to a neural network to help learn complex patterns.
4) Dropout layer: This method is used to prevent overfitting. In this method, some of the input layers are randomly set to 0.
5) Gru(gated recurrent neural network) layer: This layer is similar to LSTM units. This gate consists of a reset gate and an update gate.
Training:
We use Adam optimizer with a learning rate of 0.0001. We train the model with 15 epochs.
The model has a summation of 6,22,913 parameters, out of which 6,21,573 are trainable while 1160 are not trainable.
Results:
The test set accuracy of the model is 93.03%
Applications:
1) This model can be used to identify a particular sound and be used as an activator.
Ex: On hearing “Help” can sound an alarm in a hospital wing.
2) This model can be modified to find particular words in a recording.
Submitted by Aditya Tulsiyan (Adi1423)
Download packets of source code on Coders Packet
Comments