Speech Recognition in Python Tutorial

Speech recognition helps computers understand spoken words. It converts speech into text. This technology lets you build apps that use natural language.

Speech Recognition

What is Speech Recognition?

Speech recognition turns spoken language into text. It analyzes audio signals to find patterns and matches them to words. This helps machines understand human speech.

Why is Speech Recognition useful for Applications?

Speech recognition makes tasks easier. It helps create accessible apps. It removes typing in many cases and saves time. It reduces errors. You can transcribe meetings. You can dictate documents. You can control smart devices with your voice. It is very useful for virtual assistants, transcription services, and voice-controlled systems.

Academy Pro

Master Machine Learning with Python

Learn machine learning with Python! Master the basics, build models, and unlock the power of data to solve real-world challenges.

12 Hrs
1 Coding Exercise
Learn Machine Learning with Python

3 Steps to Implement Speech Recognition in Python

You can use the SpeechRecognition library in Python. This library works with many speech recognition engines and makes the process simple.

Step 1: Install Necessary Libraries

First, install the SpeechRecognition library. Also install PyAudio. PyAudio lets Python use your microphone.

Install SpeechRecognition: Open your terminal. Run this command:

pip install SpeechRecognition

Install PyAudio: PyAudio installation can be tricky.

For Windows users: You might need a pre-compiled wheel file. Go to PyPI. Find the correct .whl file for your Python version and system. Then, install it using pip:

pip install path/to/your/PyAudio‑0.2.11‑cpXX‑cpXX‑win_amd64.whl

You can also try pipwin:

pip install pipwin
pipwin install pyaudio

For Linux users: Use your system’s package manager:

sudo apt-get install python3-pyaudio

For macOS users:

pip install pyaudio

Step 2: Capture Audio from Your Microphone

Now Python code to capture audio from your microphone using the Recognizer class from the SpeechRecognition library.

import speech_recognition as sr

# Create a Recognizer object
r = sr.Recognizer()

# Use the microphone
with sr.Microphone() as source:
    print("Speak something...")
    # Adjust for background noise
    r.adjust_for_ambient_noise(source)
    # Listen for audio
    audio = r.listen(source)
    print("Audio captured.")

This code sets up a Recognizer, opens your microphone, and captures speech. The adjust_for_ambient_noise method filters noise to improve accuracy.

Step 3: Convert Speech to Text

After capturing audio, convert it to text using one of the recognize_* methods. The SpeechRecognition library supports various APIs, including Google Web Speech API (free for basic use), CMU Sphinx (offline), Google Cloud Speech, IBM Speech-to-Text, and Microsoft Azure Speech.

Here is an example. It uses the Google Web Speech API:

import speech_recognition as sr

r = sr.Recognizer()

with sr.Microphone() as source:
    print("Say something!")
    r.adjust_for_ambient_noise(source)
    audio = r.listen(source)

    try:
        # Use Google Web Speech API
        text = r.recognize_google(audio)
        print("You said: " + text)
    except sr.UnknownValueError:
        print("Could not understand audio.")
    except sr.RequestError as e:
        print(f"Could not request results from Google Speech Recognition service; {e}")

This code tries to convert the audio to text. It uses recognize_google(). It handles errors. These include unclear speech (UnknownValueError). It also handles API connection issues (RequestError).

Using Audio Files for Speech Recognition

You can also transcribe pre-recorded audio. Use the AudioFile class. The SpeechRecognition library works best with WAV files.

import speech_recognition as sr
from os import path

# Path to your audio file
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "your_audio_file.wav")

r = sr.Recognizer()

with sr.AudioFile(AUDIO_FILE) as source:
    audio = r.record(source)  # Read the entire file

    try:
        text = r.recognize_google(audio)
        print("Text from audio file: " + text)
    except sr.UnknownValueError:
        print("Could not understand audio from file.")
    except sr.RequestError as e:
        print(f"Could not request results from Google Speech Recognition service; {e}")

Replace “your_audio_file.wav” with your actual file path. This setup reads the audio. Then it processes it like microphone input.

Best Practices for Speech Recognition

  • Handle Errors: Speech recognition isn’t perfect. Use try-except blocks to manage UnknownValueError for unclear speech and RequestError for API issues.
  • Adjust for Ambient Noise: Always use r.adjust_for_ambient_noise(source) when using a microphone to calibrate the recognizer.
  • Choose the Right API: For offline recognition, use recognize_sphinx(). For better accuracy, opt for cloud APIs like Google Cloud Speech or Microsoft Azure Speech (may require API keys and incur costs).
  • Clear Audio Input: Audio quality impacts accuracy. Use a clear microphone and minimize background noise.

Conclusion

Speech recognition in Python is a powerful tool to build interactive applications. The SpeechRecognition library makes it easy. You can convert speech to text quickly using this library.

First, install SpeechRecognition and PyAudio. Then, use Recognizer and Microphone for live audio. Use AudioFile for pre-recorded audio. Finally, use a recognize_* method to transcribe.

Try building a voice command script today. Explore the SpeechRecognition library’s documentation.

Marina Chatterjee
Marina is a content marketer who takes keen interest in the scopes of innovation in today's digital economy. She has formerly worked with Amazon and a Facebook marketing partner to help them find their brand language. In a past life, she was an academic who taught wide-eyed undergrad Eng-lit students and made Barthes roll in his grave.

Academy Pro Subscription

Grab 50% off
on Top Courses - Free Trial Available

×
Scroll to Top