Convert Speech to Text and Text to Speech in Python

Introduction:

In this tutorial, we are learning how to convert Speech into text and vice versa in Python. In today's digital era, the ability to transfer between speech and text has become indispensable. This functionality is in high demand in many applications, from voice control to transcription services. Python has an extensive library that provides powerful tools and APIs that make converting speech to text and text to speech relatively easy.

Speech recognition is important in various applications, such as home automation, artificial intelligence, and many others. In this tutorial, we'll explore how to use Python to convert speech to text and text to speech, enabling developers to create new applications that bridge the gap between speech communication and written communication.

How to Convert the Speech to Text in Python?

The first step in converting speech to text is to identify and record the speech. Python provides the SpeechRecognition library, which provides a simple interface to many speech recognition engines, including Google Speech Recognition, CMU Sphinx, and Wit.ai. Follow these steps to convert the speech to text -

Step 1:

Firstly, we need to install the SpeechRecognition library by using the following command, which is given below -

Step 2:

Then, we need to import the necessary library and initialize a recognizer object -

import speech_recognition as s
r = s.Recognizer()

Step 3:

In this step, we use a microphone to capture voice input or upload audio files -

with s.Microphone() as source:
    print("Listening Audio")
    audio_file = recognizer.listen(source)

Step 4:

Now, we use objects to recognize speech and convert it to text in the following code -

try:
    txt = recognizer.recognize_google(audio_file)
    print("You said here:", txt)
except s.UnknownValueError:
    print("I could not recognized what you are said, sorry!!")

The above process shows the simple implementation of converting speech to text using Google's speech recognition engine. The recognize_google method performs real speech recognition by taking received sounds as input. Then, print the confirmation message to the output section. It is important to resolve possible problems, such as speech not being recognized.

There are many options in the SpeechRecognition library, such as introducing words, adjusting the speech recognition engine, and even using audio files without using the audio timer. Feel free to explore the documentation library for more references. Now that we are done converting speech to text. Let us move on to the next step, which is converting text into speech.

How to Convert the Text to Speech in Python?

Converting the text to speech involves synthesizing natural-sounding speech from text input. Python provides various libraries for this purpose, such as pyttsx3. Follow these steps to convert the text to speech in Python -

Step 1:

Firstly, we need to install the pyttsx3 library by using the following command, which is given below -

Step 2:

Then, we need to import the necessary library and initialize a speech synthesis engine, and the code is given below -

import pyttsx3 as p
e = p.init()

Step 3:

In this step, you need to set the properties of the speech synthesis engine, which is optional -

# Here we initialize the speed of the speech in words per minute unit
e.setProperty("rate", 120)  
# Here we initialize the volume level of the speech
e.setProperty("volume", 0.6)  

Step 4:

Convert text to speech using, say method in Python, and the code is given below -

txt = "Hello, Welcome to JavaTpoint."
e.say(txt)
e.runAndWait()

In the previous process, first, use the pyttsx3.init() to initialize the function library and create an instance of the speech synthesis engine. You can customize the output by adjusting things like speaking rate and volume. Finally, use the speech method to convert text to speech and the runAndWait method to make sure the speech is synthesized and plays back. It is worth noting that Pyttsx3 supports many speech synthesis engines, including Windows SAPI5, macOS NSSpeechSynthesizer, and Linux eSpeak. You can also check out the documentation by which you can learn more about the available options and configuration of the capabilities.

For the code given in the previous section, you can expect the following output if the voice input is successfully recognized -

Listening here...
You said here: Hello, Welcome to JavaTpoint.

In this example, the program uses a microphone to listen for audio input. After capturing the audio, it recognizes the speech and converts it to text using the Google speech recognition engine. The confirmation text, in this case "Hello, Welcome to JavaTpoint" is then output to the console.

If the input audio cannot be understood or recognized, you will see the output, which is given below -

Listening here...
I could not recognized what you are said, sorry!!

How to handle exceptions and advanced configuration?

When converting speech to text, it is important to make exceptions and consider advanced settings to increase the accuracy and efficiency of the conversion process. Here are some tips to improve your experience which is given in below -

Exception Handling: In the previous example, we used a try-except block to catch the UnknownValueError exception. This exception occurs when speech cannot be understood or recognized. You can extend exception handling to other errors, such as Request Error (for network or API-related issues) or WaitTimeoutError (if audio is not detected within the specified timeout period). By handling exceptions correctly, you can provide error messages or use backup strategies in case speech recognition fails.
Selection of the language: The SpeechRecognition library allows you to record words for speech. For example, you can set the language to "en-US" (for United States English) or "en-GB" (for British English). This can improve the accuracy of the speech recognition process, especially when it comes to certain words or phrases. For more information about language options and how to configure them, see the library's documentation.
Advanced recognition of the engine: Although the previous code examples use the Google speech recognition engine. Basically, the SpeechRecognition library supports other recognition engines such as CMU Sphinx, Wit.ai, and Microsoft Azure Speech. Each engine has its advantages and limitations. So, you can try multiple engines to find the one that best suits your needs.
Configuration of the text-to-speech conversion: During the text-to-speech conversion process, you can adjust many features of the pyttsx3 engine. For example, you can choose from different voices, adjust the speaking rate, and even add pauses or emphasis to certain words or phrases. See the pyttsx3 documentation for details on available features and their settings.
Audio files handling: The SpeechRecognition library allows you to process audio files for speech recognition as well as real-time microphone capture. Instead of using the listen() method, you can use the recognize_google() method directly with the audio file as input. This allows you to convert pre-recorded speech from audio files to text.

Conclusion:

In this tutorial, we learn how to convert Speech into text and vice versa in Python. Developers can easily implement these changes in their applications using libraries such as SpeechRecognition and pyttsx3. This tutorial also highlights the importance of handling exceptions and provides information on advanced settings to improve accuracy and customization. There are many applications for converting speech-to-text and speech-to-speech, including transcription services, voice assistants, etc.

Next TopicGaussian fit in python

← prev next →