(22) Text-to-speech, TTS, long text, Edge-TTS

(22) Text-to-speech, TTS, long text, Edge-TTS The code in this article uses Edge-TTS to perform text-to-speech operations, which can be stored as mp3 or wav files. There is no limit to the text length. What is called is the cloud Edge-TTS interface. I just made a simple encapsulation and compiled a UI. Directly executable files […]

paddle-speech subtitles videos

Streaming speech recognition based on flying paddles Environment deployment Only supports the weboscket protocol and does not support the http protocol. Installation environment git clone https://github.com/PaddlePaddle/PaddleSpeech.git cd PaddleSpeech pip install pytest-runner pip install . Install paddlepaddle cpu version pip install paddlepaddle==2.5.1 -i https://mirror.baidu.com/pypi/simple gpu is installed according to the cuda version, and the colab version […]

STM32–SYN6288 speech synthesis module

Foreword The voice module is one of the common modules in our learning projects. Today I will share with you the simple use of the SYN6288 module. For the software part, I will provide the complete code of stm32f103zet6/stm32f407zgt6 for your reference. For in-depth study, you also need to carefully read data sheets and other […]

Based on the arm architecture diagram, the smart box (T906G) ubuntu20.04 builds open-ai Whisper and realizes speech to text.

Foreword The arm architecture is really not fun. You can’t just rely on Baidu for strange error reports. Google is a must. Don’t be afraid of foreign blogs. Text 1. Hardware introduction The picture shows the built-in ubuntu20.04 system of the smart box. The built-in default python is 3.8. There is an nvidia graphics card, […]

Ubuntu22.04 local deployment of PaddleSpeech experimental code (GPU version)

Foreword I have previously done a project related to the local deployment of PaddleSpeech experimental code (CPU version) on Ubuntu 18.04.6. Because it is the CPU version, the time-consuming aspects of synthesis/training are really impressive. With the previous experience, I deployed another The GPU version, to be honest, although it takes a lot less time […]

Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation

1. Model architecture The fully convolutional temporal audio separation network (convt-tasnet) consists of three processing stages, as shown in (A): encoder, separation and decoder. First, an encoder module is used to convert short segments of the hybrid waveform to their corresponding representations in the intermediate feature space. This representation is then used to estimate the […]

How to implement text reading or text-to-speech function in Js

Foreword In the process of working on projects, I often encounter scenarios where customers require voice playback, such as: barrier-free reading, reading the entire article, text-to-speech, text-to-speech playback, etc. Without using the third-party API interface, js is needed to implement the text-to-speech playback function. What I can think of is to use the API of […]