Introduction to Transcribe
Speech recognition technology, also known as Automatic Speech Recognition (ASR), aims to convert the lexical content in human speech into computer-readable input, such as keystrokes, binary codes, or character sequences. Speech recognition technology has been developing for decades. It was not until 2009, when Hinton introduced artificial intelligence deep learning solutions into speech recognition, that speech recognition made a huge breakthrough.
Amazon Transcribe is an automatic speech recognition (ASR) service that enables developers to easily add speech-to-text capabilities to their applications. Since its launch at re:Invent 2017, more and more users have added speech recognition capabilities to their apps and devices. In August 2019, Amazon Transcribe launched support for Mandarin Chinese. What’s even more exciting for users is that this service is also supported in China’s Beijing region (BJS) and Ningxia region (ZHY).
In daily work and study, we often need to add subtitles to a video file. The traditional method requires recording the dialogue in the video in text, usually using tools such as Notepad to save the text record, and then using some software tools to add a timeline to the text, and then perform manual proofreading. The whole process requires a lot of time and energy. . Is there a faster way? Below we share an example of using Amazon Transcribe to automatically add subtitles to a video.
The AWS Transcribe service uses machine learning to identify sounds in speech files and then converts them into text. English and Spanish voices are currently supported. The voice file must be saved in S3, and the output results will also be saved in S3.
-
Input sound files, supporting flac, mp3, mp4 and wav file formats. Length cannot exceed 2 hours.
-
Specify language.
Several special features:
-
Speaker identification: Transcribe can distinguish multiple speakers in a speech file. Supports 2 to 10 speakers
-
Support multi-channel (channel identification): If there are multiple channels in the sound file
-
Support dictionary (vocabulary): such as unrecognized words, words that are not commonly used in specific fields
System Architecture
-
Detect file changes in the S3 bucket and trigger the lambda function;
-
The lambda function calls the Transcribe service to generate the text corresponding to the video (json format);
-
Convert text to format and generate subtitle file format (srt);
-
Upload subtitle files to the bucket.
Console operation display
-
Log in to your AWS account to enter the aws control management console, and then search for Transcribe to enter the management background.
-
Click the Create job button to use AWS’s speech-to-text service and add necessary parameter settings according to the prompts.
api interface
-
StartTranscriptionJob: Start a conversion task
-
ListTranscriptionJobs: Get the task list
-
GetTranscriptionJob: Get the task
-
CreateVocabulary: Create dictionary
-
DeleteVocabulary: delete dictionary
-
GetVocabulary: Get dictionary
-
ListVocabularies: Get a list of dictionaries
-
UpdateVocabulary: upload dictionary
Python using Transcribe demonstration example
Type 1
import time import boto3 ? transscribe = boto3.client(('transcribe')) job_name = "testTranscribeJob100" job_uri = "https://s3.dualstack.us-east-1.amazonaws.com/*****/hellosammy.mp3" ? transcribe.start_transcription_job(TranscriptionJobName=job_name, Media={'MediaFileUri': job_uri}, MediaFormat='mp3', LanguageCode='en-US') ? while True: status = transcribe.get_transcription_job(TranscriptionJobName = job_name) if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', "FAILED"]: break ? print("Job not ready yet...") time.sleep(5) ? print(status)
Type 2
-
Install Python packages
pip3 install boto3 pip3 install amazon_transcribe pip3 install websocket-client
-
import part
import hashlib import hmac import urllib.parse from datetime import datetime import time import ssl import json import websocket import_thread from amazon_transcribe.eventstream import EventStreamMessageSerializer from amazon_transcribe.eventstream import EventStreamBuffer from boto3.session import Session
-
Create URL function
def sign(key, msg): return hmac.new(key, msg.encode("utf-8"), hashlib.sha256).digest() def getSignatureKey(key, dateStamp, region, serviceName): kDate = sign(("AWS4" + key).encode("utf-8"), dateStamp) kRegion = sign(kDate, region) kService = sign(kRegion, serviceName) kSigning = sign(kService, "aws4_request") return kSigning def create_pre_signed_url(region, language_code, media_encoding, sample_rate): # Get access key and secret key credentials = Session().get_credentials() access_key_id = credentials.access_key secret_access_key = credentials.secret_key method = "GET" service = "transcribe" endpoint = "wss://transcribestreaming." + region + ".amazonaws.com:8443" host = "transcribestreaming." + region + ".amazonaws.com:8443" algorithm = "AWS4-HMAC-SHA256" t = datetime.utcnow() amz_date =t.strftime('%Y%m%dT%H%M%SZ') datestamp =t.strftime('%Y%m%d') canonical_uri = "/stream-transcription-websocket" canonical_headers = "host:" + host + "\ " signed_headers = "host" credential_scope = datestamp + "/" + region + "/" + service + "/" + "aws4_request" canonical_querystring = "X-Amz-Algorithm=" + algorithm canonical_querystring + = " & amp;X-Amz-Credential=" + urllib.parse.quote_plus(access_key_id + "/" + credential_scope) canonical_querystring + = " & amp;X-Amz-Date=" + amz_date canonical_querystring + = " & amp;X-Amz-Expires=300" canonical_querystring + = " & amp;X-Amz-SignedHeaders=" + signed_headers canonical_querystring + = " & language-code=" + language_code + " & media-encoding=" + media_encoding + " & sample-rate=" + sample_rate # Zero length string for connecting payload_hash = hashlib.sha256(("").encode('utf-8')).hexdigest() canonical_request = method + '\ ' \ + canonical_uri + '\ ' \ + canonical_querystring + '\ ' \ + canonical_headers + '\ ' \ + signed_headers + '\ ' \ + payload_hash string_to_sign = algorithm + "\ " \ + amz_date + "\ " \ + credential_scope + "\ " \ + hashlib.sha256(canonical_request.encode("utf-8")).hexdigest() signing_key = getSignatureKey(secret_access_key, datestamp, region, service) signature = hmac.new(signing_key, string_to_sign.encode("utf-8"), hashlib.sha256).hexdigest() canonical_querystring + = " & amp;X-Amz-Signature=" + signature request_url = endpoint + canonical_uri + "?" + canonical_querystring return request_url
-
main function
def main(): url = create_pre_signed_url("us-east-1", "en-US", "pcm", "16000") ws = websocket.create_connection(url, sslopt={"cert_reqs": ssl.CERT_NONE}) _thread.start_new_thread(loop_receiving, (ws,)) print("Receiving...") send_data(ws) while True: time.sleep(1) main()
-
loop_receiving function
This function is located above the main function. It will receive the return data from Amazon Transcribe Streaming Service and print it out.
def loop_receiving(ws): try: while True: result = ws.recv() if result == '': continue eventStreamBuffer = EventStreamBuffer() eventStreamBuffer.add_data(result) eventStreamMessage = eventStreamBuffer.next() stream_payload = eventStreamMessage.payload transcript = json.loads(bytes.decode(stream_payload, "UTF-8")) print("response:",transcript) results = transcript['Transcript']['Results'] if len(results)>0: for length in range(len(results)): if 'IsPartial' in results[length]: print('IsPartial:', results[length]['IsPartial']) if 'Alternatives' in results[length]: alternatives = results[length]['Alternatives'] if len(alternatives)>0: for sublength in range(len(alternatives)): if 'Transcript' in alternatives[sublength]: print('Transcript:', alternatives[sublength]['Transcript']) except Exception as e: if 'WebSocketConnectionClosedException' == e.__class__.__name__: print("Error: websocket connection is closed") else: print(f"Exception Name: {e.__class__.__name__}")
-
send_data function
This function is located above the main function. It will send audio data to Amazon Transcribe Streaming Service. The testFile variable is the test audio file address. The test audio is in pem format, English, and the sampling rate is 16000.
def send_data(ws): testFile = "xxx.pem" bufferSize = 1024*16 stream_headers = { ":message-type": "event", ":event-type": "AudioEvent", ":content-type": "application/octet-stream", } eventstream_serializer = EventStreamMessageSerializer() with open(testFile, "rb") as source: while True: audio_chunk = source.read(bufferSize) # Encode audio data event_bytes = eventstream_serializer.serialize(stream_headers, audio_chunk) ws.send(event_bytes, opcode = 0x2) # 0 x 2 send binary # end with b'' data bytes if len(audio_chunk) == 0: break
Java using Transcribe demonstration example
import com.amazonaws.AmazonServiceException; import com.amazonaws.SdkClientException; import com.amazonaws.auth.AWSCredentials; import com.amazonaws.auth.AWSStaticCredentialsProvider; import com.amazonaws.auth.BasicAWSCredentials; import com.amazonaws.auth.profile.ProfileCredentialsProvider; import com.amazonaws.regions.Regions; import com.amazonaws.services.s3.AmazonS3; import com.amazonaws.services.s3.AmazonS3Client; import com.amazonaws.services.s3.AmazonS3ClientBuilder; import com.amazonaws.services.s3.model.*; import com.amazonaws.services.s3.transfer.TransferManager; import com.amazonaws.services.s3.transfer.Upload; import com.amazonaws.services.transcribe.AmazonTranscribe; import com.amazonaws.services.transcribe.AmazonTranscribeClientBuilder; import com.amazonaws.services.transcribe.model.*; import org.omg.CosNaming.NamingContextExtPackage.StringNameHelper; import java.io.*; import java.util.ArrayList; import java.util.Arrays; import java.util.List; /** * Note on code: The generated job is only saved to S3, so the idea to save it locally is to save it to S3 first and then download it. * However, it takes time, so it is slower to continuously query whether it exists and to continuously traverse the loop. * * @author DELL * @Desc Upload the local MP3 to S3, then convert the text and save the json file to the local * The basic steps: * 1. Establish S3 client connection * 2. Upload local audio to the S3 library and return an S3 address * 3. Upload the backup word library corresponding to MP3 * 3. Create a job in Amazon Transcribe and save the generated json file to the same directory as MP3 */ public class Mp3ToJsonUtils { // Execution file configuration information private static String FILE_TYPE = "mp3"; // S3 configuration information private static String AWS_ACCESS_KEY = "Generate it yourself"; private static String AWS_SECRET_KEY = "Generate it yourself"; private static final String BUCKET_NAME = "Generate yourself"; private static final String JOB_BUCKET_NAME = "Generate it yourself"; //Aws object information private static AmazonS3 s3; private static TransferManager tx; private static AmazonTranscribe amazonTranscribe; private static BasicAWSCredentials awsCredentials; static { //1. Establish connection try { init_with_key(); } catch (Exception e) { e.printStackTrace(); } awsCredentials = new BasicAWSCredentials(AWS_ACCESS_KEY, AWS_SECRET_KEY); amazonTranscribe = AmazonTranscribeClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(awsCredentials)).withRegion(Regions.US_EAST_2).build(); } public static void main(String[] args) throws Exception { List<String> list = new ArrayList<>(); mp3TOJosn("C:\Users\DELL\Desktop\BK test data\A_Cinderella_Atlas_5.mp3", list); } public static void mp32Josn(String inPath, String savePath, List<String> list) throws Exception { String jsonPath = new File(inPath).getParent(); String name = new File(inPath).getName().replaceAll(" ", "_").replaceAll("-", "_"); File file = new File(savePath + "" + name + ".json"); //Make sure to upload once if (file.exists()) { System.out.println(savePath + "--->has been processed, skip processing"); return; } //2. Upload the file to the S3 library and get the S3 URL corresponding to the uploaded file. String s3Path = uploadFileToBucket(inPath, BUCKET_NAME); String key = new File(s3Path).getName(); key = key.replaceAll(" ", "_").replaceAll("-", "_"); //3. Create Transcription jobs createJob(JOB_BUCKET_NAME, FILE_TYPE, key, s3Path); //4. Download the json file to the same directory as the local MP3 // It takes a certain amount of time to create a job and generate json. Wait for a while and then determine whether it exists. boolean flag = isObjectExit(BUCKET_NAME, key + ".json"); while (!flag) { flag = isObjectExit(BUCKET_NAME, key + ".json"); } amazonS3Downloading(s3, JOB_BUCKET_NAME, key + ".json", savePath + "" + key + ".json"); } /** * Convert MP3 to Json file and save locally * * @param inPath * @throwsException */ public static void mp3TOJosn(String inPath, List<String> list) throws Exception { String jsonPath = new File(inPath).getParentFile().getParentFile().getParentFile().getAbsolutePath() + "\json"; File file1 = new File(jsonPath); if (!file1.exists()) { file1.mkdirs(); } mp32Josn(inPath, jsonPath, list); } /** * Connect to aws by including access key id and secret access key in the code * * @throwsException */ private static void init_with_key() throws Exception { AWSCredentials credentials = null; credentials = new BasicAWSCredentials(AWS_ACCESS_KEY, AWS_SECRET_KEY); s3 = new AmazonS3Client(credentials); //Region usWest2 = Region.getRegion(Regions.US_WEST_2); //s3.setRegion(usWest2); tx = new TransferManager(s3); } /** * Upload a local file (corresponding location is path) to the bucket named bucketName * * @param path The path to which the file needs to be uploaded * @param bucketName The name of the bucket where files are stored in S3 * return returns the key corresponding to the uploaded file */ private static String uploadFileToBucket(String path, String bucketName) { String keyName = new File(path).getName(); File fileToUpload = new File(path); if (fileToUpload.exists() == false) { System.out.println(path + " not exists!"); return null; } PutObjectRequest request = new PutObjectRequest(bucketName, fileToUpload.getName(), fileToUpload); Upload upload = tx.upload(request); while ((int) upload.getProgress().getPercentTransferred() < 100) { try { Thread.sleep(1000); } catch (InterruptedException e) { // TODO Auto-generated catch block e.printStackTrace(); } } System.out.println(path + "MP3 uploaded successfully!"); String s3Path = "s3://" + BUCKET_NAME + "/" + keyName; return s3Path; } /** * Create a Transcription job * * @param bucketName The name of the S3 bucket * @param fileName file type eg: mp3, mp4 * @param jobName The name of the job to be created * @param S3Path corresponds to the S3URL of the corresponding MP3 or other path in the S3 bucket * @param vocabularyName The name corresponding to Custom vocabulary */ protected static void createJob(String bucketName, String fileName, String jobName, String S3Path) { StartTranscriptionJobRequest startTranscriptionJobRequest = new StartTranscriptionJobRequest(); Media media = new Media(); media.setMediaFileUri(S3Path); //Set the corresponding parameters of JOb sampling rate sampling rate; startTranscriptionJobRequest.withMedia(media) .withLanguageCode(LanguageCode.EnUS) .withMediaFormat(fileName) .withOutputBucketName(bucketName) .withSettings(settings) .setTranscriptionJobName(jobName); amazonTranscribe.startTranscriptionJob(startTranscriptionJobRequest); GetTranscriptionJobRequest request; request = new GetTranscriptionJobRequest(); request.withTranscriptionJobName(jobName); GetTranscriptionJobResult result = amazonTranscribe.getTranscriptionJob(request); String status = result.getTranscriptionJob().getTranscriptionJobStatus(); while (!status.toUpperCase().equals("COMPLETED")) { try { Thread.sleep(2000); } catch (InterruptedException e) { e.printStackTrace(); } //System.out.println(status); result = amazonTranscribe.getTranscriptionJob(request); status = result.getTranscriptionJob().getTranscriptionJobStatus(); if (status.toUpperCase().equals("FAILED")) { System.out.println(result.getTranscriptionJob().getTranscriptionJobName() + "---> is failed"); System.out.println(result.getTranscriptionJob().getTranscriptionJobName() + "--->" + result.getTranscriptionJob().getFailureReason()); throw new RuntimeException("transcriobe failed"); } } System.out.println(jobName + "Mp3 Job generated successfully"); } /** * Download files on S3 to local * * @param s3Client s3 client * @param bucketName bucket name * @param key file name * @param targetFilePath local path */ public static void amazonS3Downloading(AmazonS3 s3Client, String bucketName, String key, String targetFilePath) { S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, key)); if (object != null) { System.out.println("Content-Type: " + object.getObjectMetadata().getContentType()); InputStream input = null; FileOutputStream fileOutputStream = null; byte[] data = null; try { //Get file stream input = object.getObjectContent(); data = new byte[input.available()]; int len = 0; fileOutputStream = new FileOutputStream(targetFilePath); while ((len = input.read(data)) != -1) { fileOutputStream.write(data, 0, len); } System.out.println(targetFilePath + "json file downloaded successfully"); } catch (IOException e) { e.printStackTrace(); } finally { if (fileOutputStream != null) { try { fileOutputStream.close(); } catch (IOException e) { e.printStackTrace(); } } if (input != null) { try { input.close(); } catch (IOException e) { e.printStackTrace(); } } } } } /** * Determine whether the bucket named bucketName contains an object named key * * @param bucketName * @param key * @return */ private static boolean isObjectExit(String bucketName, String key) { int len = key.length(); ObjectListing objectListing = s3.listObjects(bucketName); String s = new String(); for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) { s = objectSummary.getKey(); int slen = s.length(); if (len == slen) { int i; for (i = 0; i < len; i + + ) { if (s.charAt(i) != key.charAt(i)) { break; } } if (i == len) { return true; } } } return false; } }
Effect demonstration
The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Python entry skill tree Home page Overview 389,207 people are learning the system