ROS machine voice

Environment: ubuntu20.04 + ros noetic

Referenced “ROS Robot Development Practice”, written by teacher Hu Chunxu, which also has supporting source code

https://github.com/huchunxu/ros_exploring.git

1. Play voice

Let the robot speak

1. sound_play function package

The meta-function package audio-common in ROS provides the text-to-speech function package sound_play. Install using the following command:

sudo apt-get install ros-noetic-audio-common
sudo apt-get install libasound2
sudo apt-get install mplayer

2. Voice playback test

Run the sound_play master node with the following command and test:

roscore
rosrun sound_play soundplay_node.py

1) Test to play the built-in sound:
We use the system’s built-in sound to play and run the following command:

rosrun sound_play playbuiltin.py 2

If you hear two gongs, the run is successful.

2) Test playing WAV or MP3 sounds:
Just use play.py as an example. What follows is my wav path. You can modify it to your own:

rosrun sound_play play.py /home/fym/Music/666.wav

Or play MP3 files:

rosrun sound_play play.py /home/fym/Music/666.mp3

3) Enter the text information that needs to be converted into speech in another terminal: Text to Speech Test Speech Synthesis

rosrun sound_play say.py "Greetings Humans. Take me to your leader."

sound_play recognizes the entered text and reads it using voice. The person who makes this sound is called kal_diphone. You can also change it to another person:

sudo apt-get install festvox-don
rosrun sound_play say.py "Welcome to the future" voice_don_diphone

?

2. Let the robot understand the language

ROS integrates code from the CMU Sphinx and Festival open source projects, and releases an independent speech recognition function package – pocketsphinx, which can help our robots achieve speech recognition functions. Play voice, only Plays an .OGG or .WAV file

This section implements the basic functions of speech recognition, and can successfully recognize English voice commands and generate corresponding strings.

1. pocketsphinx function package

sudo apt-get install ros-noetic-audio-common
sudo apt-get install libasound2
sudo apt-get install gstreamer1.0
sudo apt-get install libsphinxbase3
sudo apt-get install libpocketsphinx3
sudo apt-get install libgstreamer-plugins-base1.0
sudo apt-get install gstreamer1.0-pocketsphinx

In fact, you can directly

sudo apt-get install gstreamer1.0-*
sudo apt-get install gstreamer1.0-pocketsphinx
# No need to install dependencies separately. The previous statement has already installed them

Error reported during:

Index of /debian/pool/main/p/pocketsphinx

Software sources can be added: sudo gedit /etc/apt/sources.list

Index of /debian/ | Tsinghua University Open Source Software Mirror Station | Tsinghua Open Source Mirror

 or deb http://<em>ftp.de.debian.org/debian</em> bookworm main 

Add the public key: sudo apt-key adv –keyserver keyserver.ubuntu.com –recv-keys 0E98404D386FA1D9

(The following numbers are the string after NO_PUBKEY)

Then

sudo apt-get update

Or dependency issue fix

sudo apt-get -f install

Or the dependency package multiarch-support is missing (download the corresponding version) http://launchpadlibrarian.net/416685704/multiarch-support_2.19-0ubuntu6.15_amd64.deb

sudo dpkg -i multiarch-support_2.19-0ubuntu6.15_amd64.deb

After the dependent libraries are installed, use the following command to download the source code of pocketsphinx function package from github:

git clone https://github.com/mikeferguson/pocketsphinx.git

After the download is completed, you can use the cakin_make command in the workspace to compile the function package.

The core node of the pocketsphinx function package is the recognizer.py file. This file collects voice information through the microphone, then calls the speech recognition library to recognize and generate text information, and publishes it through the /recognizer/output message. Other nodes can obtain the recognition results by subscribing to this message and process it accordingly.

2. Speech recognition test

First, plug in the microphone device and test whether there is voice input from the microphone in the system settings. The input volume cannot be too low or too high.

Note: You need to install iFlytek Speech RecognitionSDKlibmsc.so first. See Section 5

Then, run the test program from the pocketsphinx package:

roslaunch pocketsphinx robocup.launch

Requires Python3

The pocketsphinx function package provides an offline speech recognition model. The default supported models are limited. In the next section we will learn how to add our own speech model.

An error occurred: SyntaxError: Missing parentheses in call to ‘print’. Did you mean print(data_path)?

The error is caused by the different versions of Python3 and Python2 and the different syntax of the print function. The reason for this is probably that Python3 is used to run Python2 programs. So there are two ways to solve this problem.
The first one: Change the Python version (default ubuntu terminal)

First enter Python to check the Python version. If it is Python3, change the statement to Python2 + the executed command. Take Gu Yueju’s program as an example

for a in range(5, 10):
if a < 10:
print ‘a = ‘, a
a + = 1
else:
break

When typing python python_for.py, an error occurs: SyntaxError: Missing parentheses in call to ‘print’. Did you mean print(‘a = ‘, a)?

When entering: python2 python_for.py, it can run smoothly

Second method: modify the program

Turn the program print into print() as

for a in range(5, 10):
if a < 10:
print (‘a = ‘, a)
a + = 1
else:
break

You can also run through it smoothly

It is recommended to use the second method, python3

If a python import error occurs, try the following:

importpygtk
ModuleNotFoundError: No module named ‘pygtk’

pip install --upgrade pip -vvv
pip install -U setuptools
pip install pyGObject
pip install Pygtk
pip install --upgrade PyGObject
pip install --upgrade Pygtk
sudo apt-get install python-gtk2-dev python-gtk2-tutorial
sudo apt-get install python-gtk2
sudo apt-get install libgtk2.0-dev
sudo apt install libgirepository1.0-dev gcc libcairo2-dev pkg-config python3-dev gir1.2-gtk-3.0
pip3 install PyGObject
sudo apt install python3-gi python3-gi-cairo gir1.2-gtk-3.0
pip install gtk -i https://mirror.baidu.com/pypi/simple
pip install pygobject -i https://mirror.baidu.com/pypi/simple
pip install pygst-0.10 -i https://mirror.baidu.com/pypi/simple

Mainly a problem with python version

For example: commands are in the python2 version. There is no commands module in python3.0 or above. Use subprocess instead of commands.

Or you can modify the recognizer.py file as shown in the figure

Error reporting

gi.repository.GLib.Error: gst_parse_error: no element “gconfaudiosrc” (1)
This sentence is because mic_name is not added in the robocup.launch file, just add it.

Please make sure there is a microphone (only speakers will not work). You can test the microphone in settings-sound.

pacmd list-sources 

# Display device list

Find the device name corresponding to the Microphone. The yellow part here is the name.
Add mic_name in the demo/robocup.launch file,

3. Create a voice library

The identifiable information in the speech library is stored using txt files. Create a folder config in the function package robot_voice to store related files of the voice library. Then create a commands.txt file in the folder and enter the command you want to recognize.

Generate voice information and template files online from this file. This step requires logging into the following website:

http://www.speech.cs.cmu.edu/tools/lmtool-new.html

According to the website prompts, click the “Select File” button, upload the command.txt file just created, and then click the “COMPILE KNWLEDGE BASE” button to compile.

After compilation, download the “COMPRESSED TARBALL” compressed file and extract it to the config folder of the robot_voice function package. These decompressed .dic and .lm files are the speech template library generated based on the speech recognition instructions we designed. Rename all these files to commands.

4. Create launch file

Next, create a launch file, start the speech recognition node, and set the location of the speech template library. robot_voice/
launch/voice_commands.launch file

The aunch file uses the previously generated speech recognition library and file parameters when running the recognizer.py node, so that you can use your own speech library for speech recognition. In addition, the hmm speech engine parameters here have changed and changed to An engine that supports more speech models.

5. Voice command recognition

Use the following command to test the effect of voice recognition and see if the voice command set in command.txt can be successfully recognized.

roslaunch robot_voice voice_commands.launch
rostopic echo /recognizer/output

6. Chinese speech recognition

The pocketsphinx function package can not only recognize English speech, but also recognize Chinese by using the Chinese speech engine and model. The robot_voice/config folder already contains all configuration files for Chinese speech recognition, use robot_voice/
launch/chinese_recognizer.launch can start the recognizer node and link the required configuration files.

Run the file using the following command to start Chinese recognition:

roslaunch robot_voice chinese_recognizer.launch

For the recognition text supported by the Chinese speech model, please refer to pocketsphinx-cn/model/lm/zh_CN/
The content in the mandarin_notone.dic file contains nearly 100,000 recognized texts, including almost all commonly used Chinese vocabulary, so we can speak Chinese casually and test the effect of Chinese recognition by printing out the recognition results:

rostopic echo /recognizer/output

Although Chinese can be recognized, the recognition effect is not good. iFlytek’s Chinese speech recognition engine will be used later to achieve more accurate recognition results.

3. Control the robot through voice

The previous section implemented the basic functions of speech recognition, and can successfully recognize English voice commands and generate corresponding strings. This section implements a small application of voice-controlled robots based on the above functions. The robot uses a small turtle in the simulation environment.

1. Write voice control nodes

robot_voice/script/voice_teleop.py

Subscribe to the /recognizer/output topic through a Subscriber; enter the callback function after receiving the speech recognition result, and after simple processing, publish the speed control instructions for controlling the movement of the little turtle through the Publisher.

2. Voice control of turtle movement

Next, you can run this voice control routine and start all nodes with the following command:

roslaunch robot_voice voice_commands.launch
rosrun robot_voice voice_teleop.py
rosrun turtlesim turtlesim_node

After all commands in the terminal are successfully executed, the simulation interface of Little Turtle can be opened. Then control the movement of the little turtle through voice commands such as “go”, “back”, “left” and “right”.

4. Conversation with the robot

Implement a voice conversation robot application.

As shown in the figure below, the entire process of voice dialogue implementation can be divided into three nodes.

  • Speech recognition node: Convert the user’s voice into a string.
  • Smart matching response nodes: Match response strings in the database.
  • Text-to-speech node: Convert the response string into voice playback.

1. Voice recognition

The speech recognition node is based on pocketsphinx function package. Generate a speech library according to the above method, including the following commonly used communication sentences, and you can also add more sentences you need.

After the voice library is generated, name all library files chat and place them in the same path as the previous commands file. Create another launch file robot_voice/launch/chat_recognizer.launch, run the pocketsphinx speech recognition node, and set the path to the speech library

The pocketsphinx function package publishes the recognized text using the /recognizer/output topic, so create a topic conversion node /recognizer/srcipts/aiml_voice_recognizer.py to send the speech recognition results to the voiceWords topic. The voice text is published through the String type,

Now you can test the speech recognition function. Enter the following command in the terminal:

roslaunch robot_voice chat_recognizer.launch
rosrun robot_voice aiml_voice_recognizer.py
rostopic echo /voiceWords

Speak into the microphone and you can see the recognized voice string on the terminal.

2. Intelligent matching response

Speech can already be recognized as a string, and then the response text is matched based on AIML. Implement node robot_voice/scripts/aiml_voice_server.py

This node needs to load the path of the AIML database during operation, create robot_voice/launch/
start_aiml_server.launch loads parameters:

Run the following command in the terminal to test:

roslaunch robot_voice start_aiml_server.launch
rostopic echo/response
rostopic pub /voiceWords std_msgs/String "data: 'what is your name'"

You can see the matching response information in the terminal

3. Text to speech

Now that we can match the text of the response, we can use the sound_play function package we learned earlier to convert the text into speech and play it. The implementation of the text-to-speech node is completed in robot_voice/scripts/aiml_tts.py.

Use the following command in the terminal to test:

roscore
rosrun sound_play soundplay_node.py
rosrun robot_voice aiml_tts.py
rostopic pub /response std_msgs/String "data: 'what is your name'"

After running successfully, you will soon be able to hear the voice of the text “what is your name”.

4. Intelligent dialogue

We integrate the above three processes to create a complete intelligent voice dialogue application.

Create robot_voice/launch/start_chat.launch file to start all the above nodes

Start key nodes such as speech recognition, intelligent matching response, and text-to-speech through the following commands:

$ roslaunch robot_voice chat_recognizer.launch
$ roslaunch robot_voice start_chat.launch

After successful startup, you can start talking to the robot.

5. Let the robot understand Chinese

In the previous content, we have created a voice conversation robot similar to Siri. If we want to communicate with the robot in Chinese, we will use iFlytek’s speech recognition SDK.

1. iFlytekSpeech RecognitionSDK

  • First, log in to the official website of iFlytek Open Platform (http://www.xfyun.cn/) and register an account using personal information.
  • After logging in to your account, create a new application and name it your_name_ros_voice
  • After the creation is completed, you can see the voice application you just created in “My Applications”
  • Click and select “Voice Dictation” in the list
  • Then the interface will jump to the data statistics interface of the application voice dictation. There is a “SDK Download” option on the interface. Click it, and then the download options interface will appear. By default, it has been configured according to the properties of the application. Simply click “Download SDK” at the bottom of the page. “That’s it.
  • Next, copy the library files of iFlytek SDK to the system directory, and then you can link to the library files during the subsequent compilation process. Enter the libs folder in the root directory of the SDK and select the corresponding platform architecture. The 64-bit system is “x64” and the 32-bit system is “x86”. After entering the corresponding folder, use the following command to complete the copy: I am 64
sudo cp libmsc.so /usr/lib/libmsc.so

iFlytek’s SDK comes with an ID number. Everyone’s ID is different after each download. After changing the SDK, you need to modify the APPID in the code. If you run the tutorial package, you need to modify the ID in the SDK you downloaded. We need to change the APPID number in the cpp file: ctrl + f can search for it. You can also use the rrobot_voice/libs/x86/libmsc.so file.

Mainly modify the APPID number under robot_voice/src/

2. Voice dictation

To make the robot understand the Chinese we speak, this function node is based on the “iat_online_record_sample” routine in the iFlytek SDK. Copy the code in this routine to the function package robot_voice, modify the main code file iat_online_record_sample, add the required ROS interface, and rename the file to robot_voice/src/iat_publish.cpp after the modification is completed.

After compilation is complete, use the following command to test:

roscore
rosrun robot_voice iat_publish
rostopic echo /voiceWords
rostopic pub /voiceWakeup std_msgs/String "data: any 'string'"

After publishing the wake-up signal (any string), you can see the “Start Listening…” prompt, and then you can speak into the microphone. The results will be published after the online recognition is completed online.

3. Speech synthesis

The robot already has the basic ability to listen. Next, let the robot have the function of speaking. This function module is based on the tts_sample routine in iFlytek SDK. Also copy the code required for the example into the function package, then modify the main code file, add a ROS interface, and rename it to robot_voice/src/tts_subscribe.cpp

The main() function declares a Subscriber that subscribes to the voiceWords topic and accepts the input voice string. After successful reception, the SDK interface is used in the callback function voiceWordsCallback() to convert the string into Chinese voice.

Then add compilation rules to CMakeLists.txt:

After compilation is complete, use the following command to test:

roscore
rosrun robot_voice tts_subscribe
rostopic pub /voiceWords std_msgs/String "data: 'Hello, I am a robot'"

If the speech synthesis is successful, the terminal will display the following information

The robot said the sentence “Hello, I am a robot” in standard Mandarin. An error message may appear here, but it does not affect the voice output effect.

This is a problem caused by mplayer configuration. The solution is to add the following settings to the /etc/mplayer/mplayer.conf file:

sudo gedit /etc/mplayer/mplayer.conf

join in

lirc=no

Another: If mplayer appears

sudo apt-get install ldb-tools

sudo apt-get install yasm

Install mplayer from source code

If there is a problem with AO OSS /dev/dsp

but

sudo gedit ~/.mplayer/config

Write

ao=alsa

4. Intelligent voice assistant

Now the robot can hear and hear, but it does not yet have intelligent data processing capabilities. Next, we can add some data processing based on the above code to give the robot simple intelligence so that the robot can conduct simple Chinese conversations.

We can make modifications based on the tts_subscribe.cpp code. Add some functional code in the voiceWordsCallback() callback function and name it robot_voice/src/voice_assistant.cpp

In the above code, a series of if and else statements are added to determine the meaning of Chinese voice input. When we say questions such as “Who are you”, “What can you do”, “Now time”, etc., the robot can obtain the current time of the system. , and answer our questions.

Then add compilation rules to CMakeLists.txt:

After compilation is complete, use the following command to test:

roscore
rosrun robot_voice iat_publish
rosrun robot_voice voice_assistant
rostopic pub /voiceWakeup std_msgs/String "data: 'any string'"

After voice wake-up, we can ask questions to the robot.

Here we only use Chinese voice interaction as an example to implement a very simple robot voice application. The focus is to learn the integration of iFlytek SDK and ROS system to assist us in realizing more complex machine voice functions.

6. Artificial Intelligence Markup Language

Artificial Intelligence Markup Language (AIML) is an XML language for creating natural language processing software agents. AIML is mainly used to implement the language communication function of robots. Users can speak to the robot, and the robot can also give a smart answer through a natural language software agent.

Python has an open source parsing module for AIML – PyAIML. This module can build a directed pattern tree by scanning AIML files, and then match user input through in-depth search. First, a brief introduction to this module is given.

sudo apt-get install python-aiml

Check whether the installation is successful:

cd /path_to/robot_voice/data
python3
>>> import aiml

If no error is reported, the installation is successful.

I use this

pip install python-aiml

python-aiml-0.9.3.zip (2.1 MB) version

If it is difficult to download, you can add a mirror source:

sudo pip install -r tools/pip-requires -i https://mirrors.aliyun.com/pypi/simple

AIML originally did not support Chinese, and programmer yaleimeng (https://github.com/yaleimeng) transplanted it to the Chinese context. You can directly go to his warehouse (https://github.com/yaleimeng/py3Aiml_Chinese) to git it. The project runs in the python3 environment. No installation is required, just put the source code in the project directory and run it.

GitHub – yaleimeng/py3Aiml_Chinese: The official py3AIML is based on English. Now it adds Chinese support and translates code comments into Chinese. According to actual measurements, aiml files with Chinese pattern and template can be parsed normally.

pip install python-aiml version 0.9.1, the core code is available. English template library available

pip install aiml cannot be used directly. But it comes with Alice’s English template library.

To find related resources: pip search aiml

As we all know, the text encoding problem of python2 is a mass grave, so py3 is the best choice for word processing. So after reading through the entire project of py2 version aiml that supports Chinese, we started working on py3.

Display error:

Show //usr/bin/env: “python”: No such file or directory

solve:

Either Python is not installed, install Python3.8

sudo apt-get install python3.8

If Python3.8 is installed, configure the Python soft connection:

#Find the location of Python3.8

whereis python3.8

#Configure soft connection

cd /usr/bin
ln -s /usr/bin/python3.8 python

refrence: ROS Advanced (4): Practical Combat of Machine Speech Project – Zhihu

7. Sections of this chapter

Through studying this chapter, we learned the following:

  • pocketsphinx function package: used to realize English speech recognition.
  • sound_play function package: used to implement the function of English character speech.
  • Artificial Intelligence Markup Language (AIML): used to implement the language communication function of robots.
  • iFlytek SDK: an important development tool for Chinese speech recognition and synthesis.

The knowledge points of the article match the official knowledge archive, and you can further learn relevant knowledge. Python entry skill treeArtificial intelligenceMachine learning toolkit Scikit-learn385625 people are learning the system