Build a simple CLI-based voice assistant using PyAudio, speech recognition, pyttsx3, and SerpApi

1. Introduction

As you can see from the title, this is a demo project that shows a very basic voice assistant script that can answer your questions in the terminal based on Google search results.

You can find the full code in the GitHub repository: dimitryzub/serpapi-demo-projects/speech-recognition/cli-based/

Subsequent blog posts will cover:

Web-based using Flask, some HTML, CSS and Javascript solution.
An Android and Windows based solution using Flutter and Dart.

2. What we will build in this blog post

2.1 Environment preparation

First, let’s make sure we are in a different environment and have the libraries required for our project installed correctly. The most difficult (possibly) is to install .pyaudio. Please refer to the following to overcome this difficulty:

[Solution] Fix PyAudio pip installation error on win 32/64-bit operating system

2.2 Virtual environment and library installation

Before we start installing the library, we need to create and activate a new environment for this project:

# if you're on Linux based systems
$ python -m venv env & amp; & amp; source env/bin/activate
$ (env) <path>

# if you're on Windows and using Bash terminal
$ python -m venv env & amp; & amp; source env/Scripts/activate
$ (env) <path>

# if you're on Windows and using CMD
python -m venv env & & .\env\Scripts\activate
$ (env) <path>

Explanation python -m venv env tells Python to run module( -m) venv and create a folder named env. & amp; & amp; represents “and”. source /bin/activate will activate your environment and you will only be able to install libraries in that environment.

Now install all required libraries:

pip install rich pyttsx3 SpeechRecognition google-search-results

Now to pyaudio. Keep in mind that pyaudio may cause errors when installed. You may need to conduct additional research.

If you are using Linux, we need to install some development dependencies to use pyaudio:

$ sudo apt-get install -y libasound-dev portaudio19-dev
$ pip install pyaudio

If you’re using Windows, it’s even simpler (tested with CMD and Git Bash):

pip install pyaudio

3. Complete code

import os
import speech_recognition
import pyttsx3
from serpapi import GoogleSearch
from rich.console import Console
from dotenv import load_dotenv

load_dotenv('.env')
console = Console()

def main():
    console.rule('[bold yellow]SerpApi Voice Assistant Demo Project')
    recognizer = speech_recognition.Recognizer()

    while True:
        with console.status(status='Listening you...', spinner='point') as progress_bar:
            try:
                with speech_recognition.Microphone() as mic:
                    recognizer.adjust_for_ambient_noise(mic, duration=0.1)
                    audio = recognizer.listen(mic)
                    text = recognizer.recognize_google(audio_data=audio).lower()
                    console.print(f'[bold]Recognized text[/bold]: {text}')

                    progress_bar.update(status='Looking for answers...', spinner='line')
                    params = {
                        'api_key': os.getenv('API_KEY'),
                        'device': 'desktop',
                        'engine': 'google',
                        'q': text,
                        'google_domain': 'google.com',
                        'gl': 'us',
                        'hl': 'en'
                    }
                    search = GoogleSearch(params)
                    results = search.get_dict()

                    try:
                        if 'answer_box' in results:
                            try:
                                primary_answer = results['answer_box']['answer']
                            except:
                                primary_answer = results['answer_box']['result']
                            console.print(f'[bold]The answer is[/bold]: {primary_answer}')
                        elif 'knowledge_graph' in results:
                            secondary_answer = results['knowledge_graph']['description']
                            console.print(f'[bold]The answer is[/bold]: {secondary_answer}')
                        else:
                            tertiary_answer = results['answer_box']['list']
                            console.print(f'[bold]The answer is[/bold]: {tertiary_answer}')

                        progress_bar.stop() # if answered is success -> stop progress bar.
                        user_promnt_to_contiune_if_answer_is_success = input('Would you like to search for something again? (y/n) ')

                        if user_promnt_to_continiune_if_answer_is_success == 'y':
                            recognizer = speech_recognition.Recognizer()
                            continue # run speech recognition again until `user_promt` == 'n'
                        else:
                            console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
                            break
                    exceptKeyError:
                        progress_bar.stop()
                        error_user_promt = input("Sorry, didn't found the answer. Would you like to rephrase it? (y/n) ")

                        if error_user_promt == 'y':
                            recognizer = speech_recognition.Recognizer()
                            continue # run speech recognition again until `user_promt` == 'n'
                        else:
                            console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
                            break
            except speech_recognition.UnknownValueError:
                progress_bar.stop()
                user_promt_to_continue = input('Sorry, not quite understood you. Could say it again? (y/n) ')

                if user_promt_to_continue == 'y':
                    recognizer = speech_recognition.Recognizer()
                    continue # run speech recognition again until `user_promt` == 'n'
                else:
                    progress_bar.stop()
                    console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
                    break

if __name__ == '__main__':
    main()

4. Code Description

Import library:

import os
import speech_recognition
import pyttsx3
from serpapi import GoogleSearch
from rich.console import Console
from dotenv import load_dotenv

rich is used in the terminal A Python library for beautiful formatting.
pyttsx3Python’s text-to-speech converter works offline.
SpeechRecognition Python library for converting speech to text.
google-search-results A Python API wrapper for SerpApi that can parse data from 15 Data from the above search engines.
osRead the secret environment variable. In this case, it’s the SerpApi API key.
dotenvLoad environment variables (SerpApi API key) from file .env . The .env file can be renamed to any file: (.napoleon . dot) represents the environment variable file.

Define rich Console(). It will be used to beautify the terminal output (animations, etc.):

console = Console()

Define all functions that occur in main:

def main():
    console.rule('[bold yellow]SerpApi Voice Assistant Demo Project')
    recognizer = speech_recognition.Recognizer()

At the beginning of the function, we define speech_recognition.Recognizer() and console.rule to create the following output:

────────────────────────────────── SerpApi Voice Assistant Demo Project ───── ───────────────────────────────

The next step is to create a while loop that will constantly listen to microphone input to recognize speech:

while True:
    with console.status(status='Listening you...', spinner='point') as progress_bar:
        try:
            with speech_recognition.Microphone() as mic:
                recognizer.adjust_for_ambient_noise(mic, duration=0.1)
                audio = recognizer.listen(mic)

                text = recognizer.recognize_google(audio_data=audio).lower()
                console.print(f'[bold]Recognized text[/bold]: {text}')

console.status–rich progress bar, for decorative purposes only.
speech_recognition.Microphone()Start picking up input from the microphone.
recognizer.adjust_for_ambient_noise is designed to calibrate energy thresholds based on ambient energy levels.
recognizer.listenListen to the actual user text.
recognizer.recognize_google Uses the Google Speech Recongition API to perform speech recognition. lower() is to lower the recognized text.
console.printAllows the use of text modification statements rich print, such as adding bold, italic, etc.

spinner='point' will produce the following output (use python -m rich.spinner to see the list of spinners):

After that, we need to initialize the SerpApi search parameters for searching:

progress_bar.update(status='Looking for answers...', spinner='line')
params = {
    'api_key': os.getenv('API_KEY'), # serpapi api key
    'device': 'desktop', # device used for
    'engine': 'google', # serpapi parsing engine: https://serpapi.com/status
    'q': text, # search query
    'google_domain': 'google.com', # google domain: https://serpapi.com/google-domains
    'gl': 'us', # country of the search: https://serpapi.com/google-countries
    'hl': 'en' # language of the search: https://serpapi.com/google-languages
    # other parameters such as locations: https://serpapi.com/locations-api
}
search = GoogleSearch(params) # where data extraction happens on the SerpApi backend
results = search.get_dict() # JSON -> Python dict

progress_bar.update will update progress_bar with the new status (the text printed in the console), spinner='line\ 'and will produce the following animation:

After that, use SerpApi of Google search engine API to extract data from Google search.

The following portion of the code will do the following:

try:
    if 'answer_box' in results:
        try:
            primary_answer = results['answer_box']['answer']
        except:
            primary_answer = results['answer_box']['result']
        console.print(f'[bold]The answer is[/bold]: {primary_answer}')

     elif 'knowledge_graph' in results:
            secondary_answer = results['knowledge_graph']['description']
            console.print(f'[bold]The answer is[/bold]: {secondary_answer}')
        else:
            tertiary_answer = results['answer_box']['list']
            console.print(f'[bold]The answer is[/bold]: {tertiary_answer}')
        progress_bar.stop() # if answered is success -> stop progress bar

        user_promnt_to_contiune_if_answer_is_success = input('Would you like to search for something again? (y/n) ')

        if user_promnt_to_continiune_if_answer_is_success == 'y':
            recognizer = speech_recognition.Recognizer()
            continue # run speech recognition again until `user_promt` == 'n'
        else:
            console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
            break

    exceptKeyError:
        progress_bar.stop() # if didn't found the answer -> stop progress bar
        error_user_promt = input("Sorry, didn't found the answer. Would you like to rephrase it? (y/n) ")

        if error_user_promt == 'y':
            recognizer = speech_recognition.Recognizer()
            continue # run speech recognition again until `user_promt` == 'n'
        else:
            console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
            break

The final step is to handle errors when the microphone doesn’t pick up sound:

# while True:
# with console.status(status='Listening you...', spinner='point') as progress_bar:
# try:
            # speech recognition code
            #data extraction code
        except speech_recognition.UnknownValueError:
                progress_bar.stop() # if didn't heard the speech -> stop progress bar
                user_promt_to_continue = input('Sorry, not quite understood you. Could say it again? (y/n) ')

          if user_promt_to_continue == 'y':
              recognizer = speech_recognition.Recognizer()
              continue # run speech recognition again until `user_promt` == 'n'
          else:
              progress_bar.stop() # if want to quit -> stop progress bar
              console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
              break

console.rule() will provide the following output:

───────────────────── Thank you for cheking SerpApi Voice Assistant Demo Project ────────────────── ─────

Add the if __name__ == '__main__' idiom to prevent users from accidentally calling some scripts without intention, and call the main function that will run the entire script:

if __name__ == '__main__':
    main()

5. Link

rich
pyttsx3
SpeechRecognition
google-search-results
os
dotenv

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Python entry skill treeHomepageOverview 381,819 people are learning the system