The innovative use of Amazon Transcribe under the application of Amazon Cloud AI large language model

Introduction to Transcribe

Speech recognition technology, also known as Automatic Speech Recognition (ASR), aims to convert the lexical content in human speech into computer-readable input, such as keystrokes, binary codes, or character sequences. Speech recognition technology has been developing for decades. It was not until 2009, when Hinton introduced artificial intelligence deep learning solutions into speech recognition, that speech recognition made a huge breakthrough.

Amazon Transcribe is an automatic speech recognition (ASR) service that enables developers to easily add speech-to-text capabilities to their applications. Since its launch at re:Invent 2017, more and more users have added speech recognition capabilities to their apps and devices. In August 2019, Amazon Transcribe launched support for Mandarin Chinese. What’s even more exciting for users is that this service is also supported in China’s Beijing region (BJS) and Ningxia region (ZHY).

In daily work and study, we often need to add subtitles to a video file. The traditional method requires recording the dialogue in the video in text, usually using tools such as Notepad to save the text record, and then using some software tools to add a timeline to the text, and then perform manual proofreading. The whole process requires a lot of time and energy. . Is there a faster way? Below we share an example of using Amazon Transcribe to automatically add subtitles to a video.

The AWS Transcribe service uses machine learning to identify sounds in speech files and then converts them into text. English and Spanish voices are currently supported. The voice file must be saved in S3, and the output results will also be saved in S3.

Input sound files, supporting flac, mp3, mp4 and wav file formats. Length cannot exceed 2 hours.
Specify language.

Several special features:

Speaker identification: Transcribe can distinguish multiple speakers in a speech file. Supports 2 to 10 speakers
Support multi-channel (channel identification): If there are multiple channels in the sound file
Support dictionary (vocabulary): such as unrecognized words, words that are not commonly used in specific fields

System Architecture

Detect file changes in the S3 bucket and trigger the lambda function;
The lambda function calls the Transcribe service to generate the text corresponding to the video (json format);
Convert text to format and generate subtitle file format (srt);
Upload subtitle files to the bucket.

Console operation display

Log in to your AWS account to enter the aws control management console, and then search for Transcribe to enter the management background.

Click the Create job button to use AWS’s speech-to-text service and add necessary parameter settings according to the prompts.

api interface

StartTranscriptionJob: Start a conversion task
ListTranscriptionJobs: Get the task list
GetTranscriptionJob: Get the task
CreateVocabulary: Create dictionary
DeleteVocabulary: delete dictionary
GetVocabulary: Get dictionary
ListVocabularies: Get a list of dictionaries
UpdateVocabulary: upload dictionary

Python using Transcribe demonstration example

Type 1

import time
import boto3
?
transscribe = boto3.client(('transcribe'))
job_name = "testTranscribeJob100"
job_uri = "https://s3.dualstack.us-east-1.amazonaws.com/*****/hellosammy.mp3"
?
transcribe.start_transcription_job(TranscriptionJobName=job_name, Media={'MediaFileUri': job_uri}, MediaFormat='mp3', LanguageCode='en-US')
?
while True:
    status = transcribe.get_transcription_job(TranscriptionJobName = job_name)
    if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', "FAILED"]:
        break
?
    print("Job not ready yet...")
    time.sleep(5)
?
print(status)

Type 2

Install Python packages

pip3 install boto3
pip3 install amazon_transcribe
pip3 install websocket-client

import part

import hashlib
import hmac
import urllib.parse
from datetime import datetime
import time
import ssl
import json
import websocket
import_thread
from amazon_transcribe.eventstream import EventStreamMessageSerializer
from amazon_transcribe.eventstream import EventStreamBuffer
from boto3.session import Session

Create URL function

def sign(key, msg):
    return hmac.new(key, msg.encode("utf-8"), hashlib.sha256).digest()
 
def getSignatureKey(key, dateStamp, region, serviceName):
    kDate = sign(("AWS4" + key).encode("utf-8"), dateStamp)
    kRegion = sign(kDate, region)
    kService = sign(kRegion, serviceName)
    kSigning = sign(kService, "aws4_request")
    return kSigning
 
def create_pre_signed_url(region, language_code, media_encoding, sample_rate):
    # Get access key and secret key
    credentials = Session().get_credentials()
    access_key_id = credentials.access_key
    secret_access_key = credentials.secret_key
 
    method = "GET"
    service = "transcribe"
    endpoint = "wss://transcribestreaming." + region + ".amazonaws.com:8443"
    host = "transcribestreaming." + region + ".amazonaws.com:8443"
    algorithm = "AWS4-HMAC-SHA256"
 
    t = datetime.utcnow()
    amz_date =t.strftime('%Y%m%dT%H%M%SZ')
    datestamp =t.strftime('%Y%m%d')
 
    canonical_uri = "/stream-transcription-websocket"
 
    canonical_headers = "host:" + host + "\
"
    signed_headers = "host"
 
    credential_scope = datestamp + "/" + region + "/" + service + "/" + "aws4_request"
 
    canonical_querystring = "X-Amz-Algorithm=" + algorithm
    canonical_querystring + = " & amp;X-Amz-Credential=" + urllib.parse.quote_plus(access_key_id + "/" + credential_scope)
    canonical_querystring + = " & amp;X-Amz-Date=" + amz_date
    canonical_querystring + = " & amp;X-Amz-Expires=300"
    canonical_querystring + = " & amp;X-Amz-SignedHeaders=" + signed_headers
    canonical_querystring + = " & language-code=" + language_code + " & media-encoding=" + media_encoding + " & sample-rate=" + sample_rate
 
    # Zero length string for connecting
    payload_hash = hashlib.sha256(("").encode('utf-8')).hexdigest()
 
    canonical_request = method + '\
' \
                         + canonical_uri + '\
' \
                         + canonical_querystring + '\
' \
                         + canonical_headers + '\
' \
                         + signed_headers + '\
' \
                         + payload_hash
 
    string_to_sign = algorithm + "\
" \
                      + amz_date + "\
" \
                      + credential_scope + "\
" \
                      + hashlib.sha256(canonical_request.encode("utf-8")).hexdigest()
 
    signing_key = getSignatureKey(secret_access_key, datestamp, region, service)
 
    signature = hmac.new(signing_key, string_to_sign.encode("utf-8"),
                         hashlib.sha256).hexdigest()
 
    canonical_querystring + = " & amp;X-Amz-Signature=" + signature
 
    request_url = endpoint + canonical_uri + "?" + canonical_querystring
 
    return request_url

main function

def main():
    url = create_pre_signed_url("us-east-1", "en-US", "pcm", "16000")
    ws = websocket.create_connection(url, sslopt={"cert_reqs": ssl.CERT_NONE})
 
    _thread.start_new_thread(loop_receiving, (ws,))
    print("Receiving...")
    send_data(ws)
 
    while True:
        time.sleep(1)
main()

loop_receiving function

This function is located above the main function. It will receive the return data from Amazon Transcribe Streaming Service and print it out.

def loop_receiving(ws):
    try:
        while True:
            result = ws.recv()
 
            if result == '':
                continue
 
            eventStreamBuffer = EventStreamBuffer()
 
            eventStreamBuffer.add_data(result)
            eventStreamMessage = eventStreamBuffer.next()
 
            stream_payload = eventStreamMessage.payload
 
            transcript = json.loads(bytes.decode(stream_payload, "UTF-8"))
 
            print("response:",transcript)
 
            results = transcript['Transcript']['Results']
            if len(results)>0:
                for length in range(len(results)):
                    if 'IsPartial' in results[length]:
                        print('IsPartial:', results[length]['IsPartial'])
 
                    if 'Alternatives' in results[length]:
                        alternatives = results[length]['Alternatives']
                        if len(alternatives)>0:
                            for sublength in range(len(alternatives)):
                                if 'Transcript' in alternatives[sublength]:
                                    print('Transcript:', alternatives[sublength]['Transcript'])
 
 
    except Exception as e:
        if 'WebSocketConnectionClosedException' == e.__class__.__name__:
            print("Error: websocket connection is closed")
        else:
            print(f"Exception Name: {e.__class__.__name__}")

send_data function

This function is located above the main function. It will send audio data to Amazon Transcribe Streaming Service. The testFile variable is the test audio file address. The test audio is in pem format, English, and the sampling rate is 16000.

def send_data(ws):
 
    testFile = "xxx.pem"
 
    bufferSize = 1024*16
 
    stream_headers = {
        ":message-type": "event",
        ":event-type": "AudioEvent",
        ":content-type": "application/octet-stream",
    }
 
    eventstream_serializer = EventStreamMessageSerializer()
 
    with open(testFile, "rb") as source:
        while True:
            audio_chunk = source.read(bufferSize)
            # Encode audio data
            event_bytes = eventstream_serializer.serialize(stream_headers, audio_chunk)
 
            ws.send(event_bytes, opcode = 0x2) # 0 x 2 send binary
 
            # end with b'' data bytes
            if len(audio_chunk) == 0:
                break

Java using Transcribe demonstration example

import com.amazonaws.AmazonServiceException;
import com.amazonaws.SdkClientException;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3Client;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.*;
import com.amazonaws.services.s3.transfer.TransferManager;
import com.amazonaws.services.s3.transfer.Upload;
import com.amazonaws.services.transcribe.AmazonTranscribe;
import com.amazonaws.services.transcribe.AmazonTranscribeClientBuilder;
import com.amazonaws.services.transcribe.model.*;
import org.omg.CosNaming.NamingContextExtPackage.StringNameHelper;
 
import java.io.*;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
 
/**
 * Note on code: The generated job is only saved to S3, so the idea to save it locally is to save it to S3 first and then download it.
 * However, it takes time, so it is slower to continuously query whether it exists and to continuously traverse the loop.
 *
 * @author DELL
 * @Desc Upload the local MP3 to S3, then convert the text and save the json file to the local
 * The basic steps:
 * 1. Establish S3 client connection
 * 2. Upload local audio to the S3 library and return an S3 address
 * 3. Upload the backup word library corresponding to MP3
 * 3. Create a job in Amazon Transcribe and save the generated json file to the same directory as MP3
 */
public class Mp3ToJsonUtils {
 
    // Execution file configuration information
    private static String FILE_TYPE = "mp3";
    // S3 configuration information
    private static String AWS_ACCESS_KEY = "Generate it yourself";
    private static String AWS_SECRET_KEY = "Generate it yourself";
    private static final String BUCKET_NAME = "Generate yourself";
    private static final String JOB_BUCKET_NAME = "Generate it yourself";
    //Aws object information
    private static AmazonS3 s3;
    private static TransferManager tx;
    private static AmazonTranscribe amazonTranscribe;
    private static BasicAWSCredentials awsCredentials;
 
    static {
        //1. Establish connection
        try {
            init_with_key();
        } catch (Exception e) {
            e.printStackTrace();
        }
        awsCredentials = new BasicAWSCredentials(AWS_ACCESS_KEY, AWS_SECRET_KEY);
        amazonTranscribe = AmazonTranscribeClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(awsCredentials)).withRegion(Regions.US_EAST_2).build();
    }
 
    public static void main(String[] args) throws Exception {
        List<String> list = new ArrayList<>();
        mp3TOJosn("C:\Users\DELL\Desktop\BK test data\A_Cinderella_Atlas_5.mp3", list);
    }
 
    public static void mp32Josn(String inPath, String savePath, List<String> list) throws Exception {
        String jsonPath = new File(inPath).getParent();
        String name = new File(inPath).getName().replaceAll(" ", "_").replaceAll("-", "_");
        File file = new File(savePath + "" + name + ".json");
        //Make sure to upload once
        if (file.exists()) {
            System.out.println(savePath + "--->has been processed, skip processing");
            return;
        }
        //2. Upload the file to the S3 library and get the S3 URL corresponding to the uploaded file.
        String s3Path = uploadFileToBucket(inPath, BUCKET_NAME);
        String key = new File(s3Path).getName();
        key = key.replaceAll(" ", "_").replaceAll("-", "_");
        //3. Create Transcription jobs
        createJob(JOB_BUCKET_NAME, FILE_TYPE, key, s3Path);
        //4. Download the json file to the same directory as the local MP3
        // It takes a certain amount of time to create a job and generate json. Wait for a while and then determine whether it exists.
        boolean flag = isObjectExit(BUCKET_NAME, key + ".json");
        while (!flag) {
            flag = isObjectExit(BUCKET_NAME, key + ".json");
        }
        amazonS3Downloading(s3, JOB_BUCKET_NAME, key + ".json", savePath + "" + key + ".json");
    }
 
    /**
     * Convert MP3 to Json file and save locally
     *
     * @param inPath
     * @throwsException
     */
    public static void mp3TOJosn(String inPath, List<String> list) throws Exception {
        String jsonPath = new File(inPath).getParentFile().getParentFile().getParentFile().getAbsolutePath() + "\json";
        File file1 = new File(jsonPath);
        if (!file1.exists()) {
            file1.mkdirs();
        }
        mp32Josn(inPath, jsonPath, list);
    }
 
    /**
     * Connect to aws by including access key id and secret access key in the code
     *
     * @throwsException
     */
    private static void init_with_key() throws Exception {
        AWSCredentials credentials = null;
        credentials = new BasicAWSCredentials(AWS_ACCESS_KEY, AWS_SECRET_KEY);
        s3 = new AmazonS3Client(credentials);
        //Region usWest2 = Region.getRegion(Regions.US_WEST_2);
        //s3.setRegion(usWest2);
        tx = new TransferManager(s3);
    }
 
    /**
     * Upload a local file (corresponding location is path) to the bucket named bucketName
     *
     * @param path The path to which the file needs to be uploaded
     * @param bucketName The name of the bucket where files are stored in S3
     * return returns the key corresponding to the uploaded file
     */
    private static String uploadFileToBucket(String path, String bucketName) {
        String keyName = new File(path).getName();
        File fileToUpload = new File(path);
        if (fileToUpload.exists() == false) {
            System.out.println(path + " not exists!");
            return null;
        }
        PutObjectRequest request = new PutObjectRequest(bucketName, fileToUpload.getName(), fileToUpload);
        Upload upload = tx.upload(request);
        while ((int) upload.getProgress().getPercentTransferred() < 100) {
            try {
                Thread.sleep(1000);
            } catch (InterruptedException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }
        System.out.println(path + "MP3 uploaded successfully!");
        String s3Path = "s3://" + BUCKET_NAME + "/" + keyName;
        return s3Path;
    }
 
    /**
     * Create a Transcription job
     *
     * @param bucketName The name of the S3 bucket
     * @param fileName file type eg: mp3, mp4
     * @param jobName The name of the job to be created
     * @param S3Path corresponds to the S3URL of the corresponding MP3 or other path in the S3 bucket
     * @param vocabularyName The name corresponding to Custom vocabulary
     */
    protected static void createJob(String bucketName, String fileName, String jobName, String S3Path) {
        StartTranscriptionJobRequest startTranscriptionJobRequest = new StartTranscriptionJobRequest();
        Media media = new Media();
        media.setMediaFileUri(S3Path);
        //Set the corresponding parameters of JOb sampling rate sampling rate;
        startTranscriptionJobRequest.withMedia(media)
                .withLanguageCode(LanguageCode.EnUS)
                .withMediaFormat(fileName)
                .withOutputBucketName(bucketName)
                .withSettings(settings)
                .setTranscriptionJobName(jobName);
 
        amazonTranscribe.startTranscriptionJob(startTranscriptionJobRequest);
        GetTranscriptionJobRequest request;
        request = new GetTranscriptionJobRequest();
        request.withTranscriptionJobName(jobName);
        GetTranscriptionJobResult result = amazonTranscribe.getTranscriptionJob(request);
        String status = result.getTranscriptionJob().getTranscriptionJobStatus();
        while (!status.toUpperCase().equals("COMPLETED")) {
            try {
                Thread.sleep(2000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            //System.out.println(status);
            result = amazonTranscribe.getTranscriptionJob(request);
            status = result.getTranscriptionJob().getTranscriptionJobStatus();
            if (status.toUpperCase().equals("FAILED")) {
                System.out.println(result.getTranscriptionJob().getTranscriptionJobName() + "---> is failed");
                System.out.println(result.getTranscriptionJob().getTranscriptionJobName() + "--->" + result.getTranscriptionJob().getFailureReason());
                throw new RuntimeException("transcriobe failed");
            }
        }
        System.out.println(jobName + "Mp3 Job generated successfully");
    }
    /**
     * Download files on S3 to local
     *
     * @param s3Client s3 client
     * @param bucketName bucket name
     * @param key file name
     * @param targetFilePath local path
     */
    public static void amazonS3Downloading(AmazonS3 s3Client, String bucketName, String key, String targetFilePath) {
        S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, key));
        if (object != null) {
            System.out.println("Content-Type: " + object.getObjectMetadata().getContentType());
            InputStream input = null;
            FileOutputStream fileOutputStream = null;
            byte[] data = null;
            try {
                //Get file stream
                input = object.getObjectContent();
                data = new byte[input.available()];
                int len = 0;
                fileOutputStream = new FileOutputStream(targetFilePath);
                while ((len = input.read(data)) != -1) {
                    fileOutputStream.write(data, 0, len);
                }
                System.out.println(targetFilePath + "json file downloaded successfully");
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
                if (fileOutputStream != null) {
                    try {
                        fileOutputStream.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
                if (input != null) {
                    try {
                        input.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            }
        }
    }
 
    /**
     * Determine whether the bucket named bucketName contains an object named key
     *
     * @param bucketName
     * @param key
     * @return
     */
    private static boolean isObjectExit(String bucketName, String key) {
        int len = key.length();
        ObjectListing objectListing = s3.listObjects(bucketName);
        String s = new String();
        for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
            s = objectSummary.getKey();
            int slen = s.length();
            if (len == slen) {
                int i;
                for (i = 0; i < len; i + + ) {
                    if (s.charAt(i) != key.charAt(i)) {
                        break;
                    }
                }
                if (i == len) {
                    return true;
                }
            }
        }
        return false;
    }
}

Effect demonstration

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Python entry skill tree Home page Overview 389,207 people are learning the system