Amazon Fargate uses Seekable OCI for faster container launches

773c975fe367b6ecdcc32664902ee2dd.gif

While developing with containers has become increasingly popular when it comes to deploying and scaling applications, there are still some areas where improvements can be made. One of the main issues with scaling containerized applications is long startup times, especially during vertical scaling when newer instances need to be added. This issue can negatively impact customer experience (for example, when a website needs to scale out to provide additional traffic).

A research paper shows [1] that container image downloads account for 76% of container startup time, but containers only require an average of 6.4% of the data to run effectively. Launching and scaling out a containerized application requires downloading the container image from a remote container registry, because the entire image must be downloaded and decompressed before the application can be launched, which introduces a large delay.

One solution to this problem is to lazy load (also known as asynchronously load) container images. This method downloads data from the container registry while launching the application, such as the stargz-snapshotter project to shorten the overall startup time of the container.

Last year, we introduced Seekable OCI (SOCI), a technology open sourced by Amazon Cloud Technologies that enables container runtime systems to lazy-load container images to launch applications faster without modifying the container image. program. As part of this effort, we open sourced SOCI Snapshotter, a snapshotter plugin that enables lazy loading with SOCI in containerd.

Amazon Fargate support for SOCI

Amazon Fargate now supports Seekable OCI (SOCI), which enables containers to start without waiting for the entire container image to be downloaded, helping applications deploy and scale out faster. At launch, this new feature is available for Amazon Elastic Container Service (Amazon ECS) applications running on Amazon Fargate.

Here’s a quick overview of how Amazon Fargate’s support for SOCI works:

SOCI works by creating a file index (SOCI index) in an existing container image. This index is a key factor in speeding up container startup by extracting individual files from the container image without having to download the entire image. Applications no longer need to wait for the extraction and decompression of the container image to complete before they can start running. This allows you to deploy and scale out your applications faster and reduce the time it takes to release application updates.

SOCI indexes are generated and stored separately from container images. This means there is no need to convert container images to use SOCI, thus not breaking Secure Hash Algorithm (SHA)-based security such as container image signatures. The index is then stored in the registry along with the container image. At launch, Amazon Fargate’s support for SOCI works with Amazon Elastic Container Registry (Amazon ECR).

When you use Amazon ECS with Amazon Fargate to run a containerized image with a SOCI index, Amazon Fargate automatically detects whether a SOCI index exists for the image and starts the container without waiting to fetch the entire image. This also means that Amazon Fargate will continue to run container images without SOCI indexes.

Let’s get started!

There are two ways to create SOCI indexes for container images.

  • Use Amazon SOCI Index Builder: Amazon SOCI Index Builder is a serverless solution for indexing container images in Amazon Cloud. This Amazon CloudFormation stack deploys Amazon EventBridge rules to identify Amazon ECR operational events and invokes Amazon Lambda functions to match defined filter criteria. Then, another Amazon Lambda function generates the SOCI index and pushes it to the repository in the Amazon ECR registry.

  • Manually create SOCI indexes: This method provides more flexibility in creating SOCI indexes, including indexing existing container images in the Amazon ECR repository. To create a SOCI index, you can use the soci CLI provided by the soci-snapshotter project.

Amazon SOCI Index Builder provides you with an automated process to start building SOCI indexes for container images. The soci CLI gives you greater flexibility in index generation and the ability to natively integrate index generation into your CI/CD pipeline.

In this article, we will manually generate the SOCI index using the soci CLI from the soci-snapshotter project.

Create a repository and push container images

First, you used the Amazon CLI to create an Amazon ECR repository called pytorch-soci for the container image.

Bash

$ aws ecr create-repository --region us-east-1 --repository-name pytorch-soci

Swipe left to see more

Keep the Amazon ECR URI output and define it as a variable to make it easier to reference the repository in the next step.

Bash

$ ECRSOCIURI=xyz.dkr.ecr.us-east-1.amazonaws.com/pytorch-soci:latest

Swipe left to see more

For the example application, we use the PyTorch training (CPU-based) container image from the Amazon Cloud Deep Learning container. Because Docker Engine stores container images in the Docker Engine image store by default, rather than the containerd image store, use the nerdctl CLI to pull the container image.

Bash

$ SAMPLE_IMAGE="763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.5.1-cpu-py36-ubuntu16.04"
$ aws ecr get-login-password --region us-east-1 | sudo nerdctl login --username AWS --password-stdin xyz.dkr.ecr.ap-southeast-1.amazonaws.com
$ sudo nerdctl pull --platform linux/amd64 $SAMPLE_IMAGE

Swipe left to see more

Then, tag the container image of the repository created in the previous step.

Bash

$ sudo nerdctl tag $SAMPLE_IMAGE $ECRSOCIURI

Swipe left to see more

Next, the container image needs to be pushed to the ECR repository.

Bash

$ sudo nerdctl push $ECRSOCIURI

Swipe left to see more

At this point, the container image already exists in the Amazon ECR repository.

4b18b22eef67558b8d079f8a56b5f984.png

Create a SOCI index

Next, we need to create the SOCI index.

SOCI indexes are building blocks that allow lazy loading of container images. A SOCI index consists of 1) a SOCI index listing and 2) a set of zTOCs. The following diagram illustrates the components in the SOCI index manifest and how it references the container image manifest.

c1a70e2839b79818f6b65640d6a874fa.png

The SOCI index manifest contains the zTOC list and a reference to the image from which the manifest was generated. The zTOC, or directory of compressed data, consists of two parts:

  1. TOC, a directory containing file metadata and corresponding offsets in the unpacked TAR archive.

  2. zInfo, a checkpoint collection, represents the status of the compression engine at different points in the container image layer.

To learn more about this concept and terminology, visit the soci-snapshotter terminology page (https://github.com/awslabs/soci-snapshotter/blob/main/docs/glossary.md).

Before creating a SOCI index, the soci CLI needs to be installed. To learn more about how to install soci, visit getting started with soci-snapshotter (https://github.com/awslabs/soci-snapshotter/blob/main/docs/getting-started.md).

We use the soci create command to create the SOCI index.

Bash

$ sudo soci create $ECRSOCIURI
layer sha256:4c6ec688ebe374ea7d89ce967576d221a177ebd2c02ca9f053197f954102e30b -> ztoc skipped
layer sha256:ab09082b308205f9bf973c4b887132374f34ec64b923deef7e2f7ea1a34c1dad -> ztoc skipped
layer sha256:cd413555f0d1643e96fe0d4da7f5ed5e8dc9c6004b0731a0a810acab381d8c61 -> ztoc skipped
layer sha256:eee85b8a173b8fde0e319d42ae4adb7990ed2a0ce97ca5563cf85f529879a301 -> ztoc skipped
layer sha256:3a1b659108d7aaa52a58355c7f5704fcd6ab1b348ec9b61da925f3c3affa7efc -> ztoc skipped
layer sha256:d8f520dcac6d926130409c7b3a8f77aea639642ba1347359aaf81a8b43ce1f99 -> ztoc skipped
layer sha256:d75d26599d366ecd2aa1bfa72926948ce821815f89604b6a0a49cfca100570a0 -> ztoc skipped
layer sha256:a429d26ed72a85a6588f4b2af0049ae75761dac1bb8ba8017b8830878fb51124 -> ztoc skipped
layer sha256:5bebf55933a382e053394e285accaecb1dec9e215a5c7da0b9962a2d09a579bc -> ztoc skipped
layer sha256:5dfa26c6b9c9d1ccbcb1eaa65befa376805d9324174ac580ca76fdedc3575f54 -> ztoc skipped
layer sha256:0ba7bf18aa406cb7dc372ac732de222b04d1c824ff1705d8900831c3d1361ff5 -> ztoc skipped
layer sha256:4007a89234b4f56c03e6831dc220550d2e5fba935d9f5f5bcea64857ac4f4888 -> ztoc sha256:0b4d78c856b7e9e3d507ac6ba64e2e2468997639608ef4 3c088637f379bb47e4
layer sha256:089632f60d8cfe243c5bc355a77401c9a8d2f415d730f00f6f91d44bb96c251b -> ztoc sha256:f6a16d3d07326fe3bddbdb1aab5fbd4e924ec357b4292a 6933158cc7cc33605b
layer sha256:f18dd99041c3095ade3d5013a61a00eeab8b878ba9be8545c2eabfbca3f3a7f3 -> ztoc sha256:95d7966c964dabb54cb110a1a8373d7b88cfc479336d473f6ba0f 275afa629dd
layer sha256:69e1edcfbd217582677d4636de8be2a25a24775469d677664c8714ed64f557c3 -> ztoc sha256:ac0e18bd39d398917942c4b87ac75b90240df1e5cb13999 869158877b400b865

Swipe left to see more

From the above output, you can see that the soci CLI creates zTOCs for the four container image layers, which means that the other container image layers will be downloaded in full before the container image is started, and only the extraction of these four layers will be delayed. This is because lazy loading of very small container image layers has less impact on startup time. However, you can configure this behavior using the –min-layer-size flag when running soci create.

Verify and push SOCI index

The soci CLI also provides several commands to help you view the generated SOCI index.

To see a list of all index manifests, you can run the following command.

Bash

$ sudo soci index list


DIGEST SIZE IMAGE REF PLATFORM MEDIA TYPE CREATED
sha256:ea5c3489622d4e97d4ad5e300c8482c3d30b2be44a12c68779776014b15c5822 1931 xyz.dkr.ecr.us-east-1.amazonaws.com/pytorch-soci:latest linux/amd64 application/vnd.oci.image. manifest.v1 + json 10m4s ago
sha256:ea5c3489622d4e97d4ad5e300c8482c3d30b2be44a12c68779776014b15c5822 1931 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.5.1-cpu- py36-ubuntu16.04 linux/amd64 application/vnd.oci.image.manifest.v1 + json 10m4s ago

Swipe left to see more

Although viewing the zTOC list is optional, if needed, you can use the following command:

Bash

$ sudo soci ztoc list
DIGEST SIZE LAYER DIGEST
sha256:0b4d78c856b7e9e3d507ac6ba64e2e2468997639608ef43c088637f379bb47e4 2038072 sha256:4007a89234b4f56c03e6831dc220550d2e5fba935d9 f5f5bcea64857ac4f4888
sha256:95d7966c964dabb54cb110a1a8373d7b88cfc479336d473f6ba0f275afa629dd 11442416 sha256:f18dd99041c3095ade3d5013a61a00eeab8b878ba9be8545c2 eabfbca3f3a7f3
sha256:ac0e18bd39d398917942c4b87ac75b90240df1e5cb13999869158877b400b865 36277264 sha256:69e1edcfbd217582677d4636de8be2a25a24775469d 677664c8714ed64f557c3
sha256:f6a16d3d07326fe3bddbdb1aab5fbd4e924ec357b4292a6933158cc7cc33605b 10152696 sha256:089632f60d8cfe243c5bc355a77401c9a8d2f415d730 f00f6f91d44bb96c251b

Swipe left to see more

This series of zTOCs contains all the information SOCI needs to find a given file in the container image layer. To view the zTOC for each layer, use one of the summary sums output earlier and then use the following command.

Bash

$ sudo soci ztoc info sha256:0b4d78c856b7e9e3d507ac6ba64e2e2468997639608ef43c088637f379bb47e4
{
  "version": "0.9",
  "build_tool": "AWS SOCI CLI v0.1",
  "size": 2038072,
  "span_size": 4194304,
  "num_spans": 33,
  "num_files": 5552,
  "num_multi_span_files": 26,
  "files": [
    {
      "filename": "bin/",
      "offset": 512,
      "size": 0,
      "type": "dir",
      "start_span": 0,
      "end_span": 0
    },
    {
      "filename": "bin/bash",
      "offset": 1024,
      "size": 1037528,
      "type": "reg",
      "start_span": 0,
      "end_span": 0
    }


---Trimmed for brevity---

Swipe left to see more

Now, you need to push all SOCI related artifacts into Amazon ECR using the following command.

Bash

$ PASSWORD=$(aws ecr get-login-password --region us-east-1)
$ sudo soci push --user AWS:$PASSWORD $ECRSOCIURI

Swipe left to see more

You can verify that the index was created by accessing the Amazon ECR repository. Here we can see two other objects listed next to the container image: SOCI Index and Image Index. Image indexes allow Amazon Fargate to find the SOCI index associated with a container image.

5203f5babd1da89727c2091761f33984.png

Learn about SOCI performance

The main goal of SOCI is to minimize the time required to launch a containerized application. To measure the performance of Amazon Fargate lazy-loaded container images using SOCI, we need to understand how long it takes for the container image to start with SOCI and without SOCI.

To understand the duration required for each container image to start, I can use the metrics provided by the DescribeTasks API on Amazon ECS. The first metric is createdAt , which is the timestamp when the task was created and entered the PENDING state. The second metric is startedAt , which is the time when the task transitions from the PENDING state to the RUNNING state.

To do this, we created another Amazon ECR repository using the same container image, but without generating a SOCI index, named pytorch-without-soci . If you want to compare these container images, there are two other objects in pytorch-soci (an image index and a SOCI index) that do not exist in pytorch-without-soci.

b207bd064dd553033f6248cd9d4a7066.png

Deploy and run the application

To run these applications, I created an Amazon ECS cluster named demo-pytorch-soci-cluster, a VPC, and the required ECS task execution role. If you are new to Amazon ECS, you can read Getting Started with Amazon ECS (https://docs.aws.amazon.com/AmazonECS/latest/developerguide/getting-started.html) to learn how to deploy and run containerized applications.

Now, let’s deploy and run these two container images with FARGATE as the launch type. We define five tasks for each of pytorch-soci and pytorch-without-soci.

Bash

$ aws ecs \
    --region us-east-1 \
    run-task\
    --count 5 \
    --launch-type FARGATE \
    --task-definition arn:aws:ecs:us-east-1:XYZ:task-definition/pytorch-soci \
    --cluster socidemo


$awsecs\
    --region us-east-1 \
    run-task \
    --count 5 \
    --launch-type FARGATE \
    --task-definition arn:aws:ecs:us-east-1:XYZ:task-definition/pytorch-without-soci \
    --cluster socidemo

Swipe left to see more

After a few minutes, there are 10 running tasks on the ECS cluster.

bd89044ba839d3e76f8bd327da0c2db8.jpeg

After confirming that all tasks are running, we run the following script to get two metrics: createdAt and startedAt .

Bash

#!/bin/bash
CLUSTER=<CLUSTER_NAME>
TASKDEF=<TASK_DEFINITION>
REGION="us-east-1"
TASKS=$(aws ecs list-tasks \
    --cluster $CLUSTER \
    --family $TASKDEF \
    --region $REGION \
    --query 'taskArns[*]' \
    --output text)


aws ecs describe-tasks \
    --tasks $TASKS \
    --region $REGION \
    --cluster $CLUSTER \
    --query "tasks[] | reverse(sort_by(@, & amp;createdAt)) | [].[{startedAt: startedAt, createdAt: createdAt, taskArn: taskArn}]" \
    --output table

Swipe left to see more

Running the above command against a container image without a SOCI index ( pytorch-without-soci ) produces the following output:

Bash

-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ----------------------------------
| DescribeTasks |
 + ---------------------------------- + --------------- -------------------------- + -------------------------- -------------------------------------------------- ------------------------------- +
| createdAt | startedAt | taskArn |
 + ---------------------------------- + --------------- -------------------------- + -------------------------- -------------------------------------------------- ------------------------------- +
| 2023-07-07T17:43:59.233000 + 00:00 | 2023-07-07T17:46:09.856000 + 00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci -cluster/dcdf19b6e66444aeb3bc607a3114fae0 |
| 2023-07-07T17:43:59.233000 + 00:00 | 2023-07-07T17:46:09.459000 + 00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci -cluster/9178b75c98ee4c4e8d9c681ddb26f2ca |
| 2023-07-07T17:43:59.233000 + 00:00 | 2023-07-07T17:46:21.645000 + 00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci -cluster/7da51e036c414cbab7690409ce08cc99 |
| 2023-07-07T17:43:59.233000 + 00:00 | 2023-07-07T17:46:00.606000 + 00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci -cluster/5ee8f48194874e6dbba75a5ef753cad2 |
| 2023-07-07T17:43:59.233000 + 00:00 | 2023-07-07T17:46:02.461000 + 00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci -cluster/58531a9e94ed44deb5377fa997caec36 |
 + ---------------------------------- + --------------- -------------------------- + -------------------------- -------------------------------------------------- ------------------------------- + 

Swipe left to see more

Based on the average cumulative time difference per task (between startedAt and createdAt), pytorch-without-soci (without SOCI index) successfully ran after 129 seconds.

Next, run the same command against pytorch-soci with SOCI index.

Bash

-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ----------------------------------
| DescribeTasks |
 + ---------------------------------- + --------------- -------------------------- + -------------------------- -------------------------------------------------- ------------------------------- +
| createdAt | startedAt | taskArn |
 + ---------------------------------- + --------------- -------------------------- + -------------------------- -------------------------------------------------- ------------------------------- +
| 2023-07-07T17:43:53.318000 + 00:00 | 2023-07-07T17:44:51.076000 + 00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci -cluster/c57d8cff6033494b97f6fd0e1b797b8f |
| 2023-07-07T17:43:53.318000 + 00:00 | 2023-07-07T17:44:52.212000 + 00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci -cluster/6d168f9e99324a59bd6e28de36289456 |
| 2023-07-07T17:43:53.318000 + 00:00 | 2023-07-07T17:45:05.443000 + 00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci -cluster/4bdc43b4c1f84f8d9d40dbd1a41645da |
| 2023-07-07T17:43:53.318000 + 00:00 | 2023-07-07T17:44:50.618000 + 00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci -cluster/43ea53ea84154d5aa90f8fdd7414c6df |
| 2023-07-07T17:43:53.318000 + 00:00 | 2023-07-07T17:44:50.777000 + 00:00 | arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci -cluster/0731bea30d42449e9006a5d8902756d5 |
 + ---------------------------------- + --------------- -------------------------- + -------------------------- -------------------------------------------------- ------------------------------- + 

Swipe left to see more

Here we see that the SOCI-enabled container image pytorch-soci is started 60 seconds after it was created.

This means that running the sample application using the SOCI index is approximately 50% faster on Amazon Fargate than running the sample application without the SOCI index.

It is recommended to benchmark your application’s startup and scale-out times with and without SOCI. This can help you better understand how your application is performing and whether your application benefits from Amazon Fargate’s support for SOCI.

Customer Voices

During the private preview, we heard a lot of feedback from customers about Amazon Fargate’s support for SOCI. Here’s what our customers have to say:

1b63fdbd6411fd83f19bdfadca4d77a8.png

Autodesk provides critical design, manufacturing and operations software solutions to the architecture, engineering, construction, manufacturing, media and entertainment industries. “SOCI improved launch performance by 50% when running time-sensitive simulated workloads on Amazon ECS using Amazon Fargate. This enabled our application to scale out faster, allowing us to quickly meet our growing user base demand and save costs by reducing idle computing capacity. Amazon Cloud Technology partner solutions for creating SOCI indexes are easy to configure and deploy.” said Boaz Brudner, Head of Innovyze SaaS Engineering, AI and Architecture at Autodesk.

db7f7aaa561a0fa58b9a84b3fb2c4b65.png

Flywire is a global payments enablement and software company missioned to deliver the world’s most important and complex payments. “We run a multi-step deployment pipeline using Amazon Fargate on Amazon ECS, and the entire process takes minutes to complete. With SOCI, we didn’t have to make any changes to the application or deployment process, which reduced the overall pipeline time by more than 50%. This allowed us to significantly reduce the release time of application updates. For some larger images over 750MB, SOCI reduced task startup time by more than 60%,” said Samuel Burgos, Sr. Cloud Security Engineer at Flywire.

93e7d52498981d4f750247b8f2adc90a.png

Virtuoso is a leading software company that produces functional user interfaces and end-to-end testing software. “SOCI helps us reduce the latency between compute demand and availability. We have extremely bursty workloads that customers want to spin up as quickly as possible. SOCI helps accelerate ECS tasks by 40%, allowing us to quickly scale applications and Reducing the pool of idle compute capacity allows us to create value more efficiently. Setting up SOCI is really simple. We chose to use Rapid Start’s Amazon Cloud Technology partner’s solution, which keeps the build and deployment pipeline intact,” Said Mathew Hall, Head of Site Reliability Engineering at Virtuoso.

Notes

Availability: Amazon Fargate support for SOCI is available in all Amazon Cloud Technology regions where Amazon ECS, Amazon Fargate, and Amazon ECR are available.

Pricing: Amazon Fargate supports SOCI at no additional cost, you only pay to store the SOCI index in Amazon ECR.

Getting started: Learn more about the benefits and getting started guide on the Amazon Fargate support for SOCI page.

Good luck with your build.

[1] https://www.usenix.org/conference/fast16/technical-sessions/presentation/harter

d92834905dfa07ebc5fbe93e34834489.gif

ad390a835da22efbab8bcad881386d76.gif

I heard, click the 4 buttons below

You won’t encounter bugs!

de85fd947ece2bca584ffbd8aa5617f1.gif

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Cloud native entry-level skills treeHomepageOverview 15324 people are learning the system