By C Koushik
Automated voice recording and transcription using sounddevice and SpeechRecognition modules in Python.
In this project, we'll first record our voice using the sounddevice python module, and the voice will be stored locally as a '.wav' extension file. We then take the audio as an input for the SpeechRecognition module that uses google's API recognize_google() to return a string that is the transcript of our recording.
sounddevice provides functions to play and record NumPy arrays of audio signals which can then be written into a ".wav" audio file using wavio module.
The requirements text file contains the packages that are necessary to run this application. To install and deploy the packages, go to the project directory using command prompt and execute the following command.
pip install -r requirements.txt
Once the packages are installed, the application is good to go. To run the app, type
1) Import necessary packages
import sounddevice as sd import wavio as wv import speech_recognition as sr
2) Set Sampling Frequency for recording, which is usually between 44000 and 48000 and the recording duration.
3) Start the sounddevice module's recorder and initialize the function with the given frequency and duration and set the channel value.
recording = sd.rec(int(duration * freq), samplerate=frequency, channels=2)
4) Convert the numpy array generated to ".wav" audio file and use that as an input to transcribe using SpeechRecognition module and return the string using recognize_google() instance of Recognizer class.
wv.write("recording.wav", recording, frequency, sampwidth=2) def transcribe(): audio = sr.AudioFile('recording.wav') with audio as source: audio = r.record(source) print(r.recognize_google(audio))