Speech to Speech Translation Using Google API

In this tutorial, we will build a Real-Time Voice Translator using Python. This project combines multiple libraries, such as Tkinter for GUI, SpeechRecognition for audio input, gTTS for text-to-speech, and GoogleTranslator for translation. The translator will allow users to speak, recognize speech, translate it into a chosen language, and listen to the translation in real-time.

Why Create a Voice Translator in Python?

A voice translator offers an excellent opportunity to combine several technologies, including voice recognition, language translation, and GUI development. This project can:

Assist in learning new languages.
Enable communication across linguistic barriers.
Be a foundation for building more advanced multilingual applications.

Prerequisites

Before proceeding, ensure you have the following installed:

Python 3.x
Tkinter for GUI
gTTS, speech_recognition, playsound, deep_translator, and threading libraries
Install the necessary libraries using:

pip install gTTS speechrecognition playsound deep-translator

Code Snippet

import os
import threading
import tkinter as tk
from gtts import gTTS
from tkinter import ttk
import speech_recognition as sr
from playsound import playsound
from deep_translator import GoogleTranslator

# GUI Configuration
win = tk.Tk()
win.geometry("700x450")
win.title("Real-Time Voice🎙️ Translator🔊")

# Icon Setup
icon = tk.PhotoImage(file="icon.png")
win.iconphoto(False, icon)

# Labels and Text Boxes
input_label = tk.Label(win, text="Recognized Text ⮯")
input_label.pack()
input_text = tk.Text(win, height=5, width=50)
input_text.pack()

output_label = tk.Label(win, text="Translated Text ⮯")
output_label.pack()
output_text = tk.Text(win, height=5, width=50)
output_text.pack()

blank_space = tk.Label(win, text="")
blank_space.pack()

# Language Selection
language_codes = {
    "English": "en", "Hindi": "hi", "Spanish": "es", "French": "fr",
    "German": "de", "Chinese (Simplified)": "zh-CN", "Japanese": "ja",
    "Russian": "ru", "Korean": "ko", "Tamil": "ta", "Telugu": "te"
}
language_names = list(language_codes.keys())

input_lang_label = tk.Label(win, text="Select Input Language:")
input_lang_label.pack()
input_lang = ttk.Combobox(win, values=["auto"] + language_names)
input_lang.set("auto")
input_lang.pack()

output_lang_label = tk.Label(win, text="Select Output Language:")
output_lang_label.pack()
output_lang = ttk.Combobox(win, values=language_names)
output_lang.set("English")
output_lang.pack()

blank_space = tk.Label(win, text="")
blank_space.pack()

keep_running = False

# Translator Logic
def update_translation():
    global keep_running
    if keep_running:
        recognizer = sr.Recognizer()
        with sr.Microphone() as source:
            try:
                audio = recognizer.listen(source)
                speech_text = recognizer.recognize_google(audio, language=input_lang.get())
                input_text.insert(tk.END, f"{speech_text}\n")
                translated_text = GoogleTranslator(source=input_lang.get(), target=output_lang.get()).translate(speech_text)
                output_text.insert(tk.END, translated_text + "\n")
                voice = gTTS(translated_text, lang=output_lang.get())
                voice.save("voice.mp3")
                playsound("voice.mp3")
                os.remove("voice.mp3")
            except Exception as e:
                output_text.insert(tk.END, f"Error: {e}\n")

    win.after(100, update_translation)

def run_translator():
    global keep_running
    keep_running = True
    threading.Thread(target=update_translation).start()

def kill_execution():
    global keep_running
    keep_running = False

# Buttons
run_button = tk.Button(win, text="Start Translation", command=run_translator)
run_button.place(relx=0.25, rely=0.9, anchor="c")

kill_button = tk.Button(win, text="Kill Execution", command=kill_execution)
kill_button.place(relx=0.5, rely=0.9, anchor="c")

win.mainloop()

Explanation of Key Features

1. Language Selection Dropdowns

Enables users to select the input and output languages.
Defaults to auto for input and English for output.

2. Speech Recognition and Translation

Captures audio input using the speech_recognition library.
Translates recognized text using deep-translator.

3. Real-Time Audio Playback

Converts translated text to speech using gTTS.
Plays the audio file using playsound.

OUTPUT

The application starts with a GUI window. Users can:

Speak into the microphone.
View recognized text in the input box.
See and hear the translation in real-time.

OUTPUT 1 : Speech-to-Speech-Translation

OUTPUT 2 : Speech-to-Speech-Translation

Code Explanation: Real-Time Voice Translator

Here’s a detailed explanation of the code for your real-time voice translator application, broken down section by section:

1. Importing Libraries

import os
import threading
import tkinter as tk
from gtts import gTTS
from tkinter import ttk
import speech_recognition as sr
from playsound import playsound
from deep_translator import GoogleTranslator

os: To handle file operations like saving and deleting audio files.
threading: For running processes (e.g., speech recognition) without freezing the GUI.
tkinter: To create the graphical user interface.
gTTS: To convert translated text into speech.
speech_recognition: For real-time speech recognition from the microphone.
playsound: To play the generated speech audio.
deep_translator.GoogleTranslator: For translating recognized speech into another language.

2. Tkinter Window Setup

win = tk.Tk()
win.geometry("700x450")
win.title("Real-Time Voice🎙️ Translator🔊")
icon = tk.PhotoImage(file="icon.png")
win.iconphoto(False, icon)

win = tk.Tk(): Creates the main window for the application.
geometry: Sets the size of the window.
title: Sets the title of the application.
iconphoto: Adds a custom icon for the application window.

3. Creating Input and Output Text Fields

input_label = tk.Label(win, text="Recognized Text ⮯")
input_label.pack()
input_text = tk.Text(win, height=5, width=50)
input_text.pack()

output_label = tk.Label(win, text="Translated Text ⮯")
output_label.pack()
output_text = tk.Text(win, height=5, width=50)
output_text.pack()

Label: Displays static text (titles for input and output fields).
Text: Creates text boxes for displaying recognized and translated text.
pack(): Places the elements in the window.

4. Language Dropdown Menus

language_codes = {...}  # Dictionary of languages and their codes
language_names = list(language_codes.keys())

input_lang_label = tk.Label(win, text="Select Input Language:")
input_lang_label.pack()
input_lang = ttk.Combobox(win, values=language_names)
input_lang.bind("<<ComboboxSelected>>", lambda e: update_input_lang_code(e))
input_lang.pack()

language_codes: Maps language names to their respective codes (used for translation).
Combobox: A dropdown menu for selecting languages.
bind(): Updates the selected language code when a language is chosen.

Similar setup applies to the output language dropdown menu.

5. Translation Logic

Global Variable and Main Function

keep_running = False

def update_translation():
    global keep_running
    ...
    win.after(100, update_translation)

keep_running: Tracks whether the application is actively listening and translating.
update_translation(): Continuously listens for speech, translates it, and plays the audio if keep_running is True.

Speech Recognition

r = sr.Recognizer()
with sr.Microphone() as source:
    audio = r.listen(source)
    speech_text = r.recognize_google(audio)

sr.Recognizer: Initializes the recognizer object.
sr.Microphone(): Accesses the default microphone.
r.listen(): Captures audio input.
r.recognize_google(): Converts the audio input to text using Google’s speech recognition API.

Translation and Text-to-Speech

translated_text = GoogleTranslator(source=input_lang.get(), target=output_lang.get()).translate(text=speech_text)
voice = gTTS(translated_text, lang=output_lang.get())
voice.save('voice.mp3')
playsound('voice.mp3')
os.remove('voice.mp3')

GoogleTranslator: Translates the recognized text into the selected output language.
gTTS: Converts translated text into speech.
playsound: Plays the generated audio.
os.remove: Deletes the audio file after playing it.

6. Starting and Stopping Translation

def run_translator():
    global keep_running
    if not keep_running:
        keep_running = True
        threading.Thread(target=update_translation).start()

def kill_execution():
    global keep_running
    keep_running = False

run_translator(): Starts the translation process by setting keep_running to True and running update_translation() in a separate thread.
kill_execution(): Stops the translation process by setting keep_running to False.

7. Buttons for User Control

run_button = tk.Button(win, text="Start Translation", command=run_translator)
run_button.place(relx=0.25, rely=0.9, anchor="c")

kill_button = tk.Button(win, text="Kill Execution", command=kill_execution)
kill_button.place(relx=0.5, rely=0.9, anchor="c")

Button: Creates buttons for starting and stopping the translation process.
place(): Positions the buttons in the window.

8. Event Loop

win.mainloop()

Keeps the application running, listening for user inputs and interactions.

Improvements and Suggestions

Error Handling: Already included for unknown speech and Google API errors, but consider displaying pop-ups for better user feedback.
Thread Safety: Add locks for shared resources like keep_running to avoid concurrency issues.
UI Enhancement: Add additional UI features like a progress bar or status indicator.

LINKS :

Direct Speech-to-Speech Translation

Speech-to-Speech translation using Deep Learning

speech-to-speech-translation · GitHub Topics · GitHub

Text to speech converter in python