(22) Text-to-speech, TTS, long text, Edge-TTS The code in this article uses Edge-TTS to perform text-to-speech operations, which can be stored as mp3 or wav files. There is no limit to the text length. What is called is the cloud Edge-TTS interface. I just made a simple encapsulation and compiled a UI. Directly executable files […]
Tag: speech
Data classification of BP neural network-speech feature signal classification
Hello everyone, I am taking me to ski! BP neural network, also known as backpropagation neural network, is a type of artificial neural network (ANN) commonly used for classification and regression tasks. It is a feedforward neural network that usually consists of an input layer, one or more hidden layers, and an output layer. The […]
paddle-speech subtitles videos
Streaming speech recognition based on flying paddles Environment deployment Only supports the weboscket protocol and does not support the http protocol. Installation environment git clone https://github.com/PaddlePaddle/PaddleSpeech.git cd PaddleSpeech pip install pytest-runner pip install . Install paddlepaddle cpu version pip install paddlepaddle==2.5.1 -i https://mirror.baidu.com/pypi/simple gpu is installed according to the cuda version, and the colab version […]
STM32–SYN6288 speech synthesis module
Foreword The voice module is one of the common modules in our learning projects. Today I will share with you the simple use of the SYN6288 module. For the software part, I will provide the complete code of stm32f103zet6/stm32f407zgt6 for your reference. For in-depth study, you also need to carefully read data sheets and other […]
Based on the arm architecture diagram, the smart box (T906G) ubuntu20.04 builds open-ai Whisper and realizes speech to text.
Foreword The arm architecture is really not fun. You can’t just rely on Baidu for strange error reports. Google is a must. Don’t be afraid of foreign blogs. Text 1. Hardware introduction The picture shows the built-in ubuntu20.04 system of the smart box. The built-in default python is 3.8. There is an nvidia graphics card, […]
Ubuntu22.04 local deployment of PaddleSpeech experimental code (GPU version)
Foreword I have previously done a project related to the local deployment of PaddleSpeech experimental code (CPU version) on Ubuntu 18.04.6. Because it is the CPU version, the time-consuming aspects of synthesis/training are really impressive. With the previous experience, I deployed another The GPU version, to be honest, although it takes a lot less time […]
Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation
1. Model architecture The fully convolutional temporal audio separation network (convt-tasnet) consists of three processing stages, as shown in (A): encoder, separation and decoder. First, an encoder module is used to convert short segments of the hybrid waveform to their corresponding representations in the intermediate feature space. This representation is then used to estimate the […]
[Ten years of work experience, speech] Guangzhou City CSDN Developer 1024 Event
Ten years after graduation and working hours, I silently asked myself, has my original wish come true? On the occasion of this decade, I am very happy and honored to participate in the Guangzhou City Developers 1024 Programmers Festival organized by the CSDN platform and give a speech as a guest. Because I was in […]
Lifelike, timbre cloning, Bert-vits2 text-to-speech creation of ghost video practice (Python3.10)
Does anyone know which is the most awesome TTS free open source project currently? That’s right, it’s Bert-vits2, like no other. It integrates the Bert large model into the already extremely powerful Vits project, which basically solves the tone and rhythm problem of VITS. When the effect is very good, the cost of training is […]
How to implement text reading or text-to-speech function in Js
Foreword In the process of working on projects, I often encounter scenarios where customers require voice playback, such as: barrier-free reading, reading the entire article, text-to-speech, text-to-speech playback, etc. Without using the third-party API interface, js is needed to implement the text-to-speech playback function. What I can think of is to use the API of […]