How much memory is required to run 1 million concurrent tasks?

In this post, I compare the memory consumption of asynchronous and multithreaded programming in popular languages like Rust, Go, Java, C#, Python, Node.js, and Elixir.

A while ago, I needed to compare the performance of some computer programs that handle a large number of network connections. I found that the memory consumption of these programs varies greatly, even more than 20 times. Some programs consume only around 100 MB of memory, while others hit close to 3 GB memory consumption when handling 10k connections. Unfortunately, these programs are complex and functionally different, so it’s hard to compare them directly and draw meaningful conclusions, because it’s not a fair comparison. This prompted me to create a synthetic benchmark.

Benchmark

I have created the following programs in various programming languages:

Let’s start N concurrent tasks, each wait 10 seconds, and then the program exits when all tasks are complete. The number of tasks is controlled by command line arguments.

With a little help from ChatGPT, I can write such a program in minutes, even in a programming language I don’t use every day. For convenience, all benchmark code is available on my GitHub.

Rust

I have created 3 programs in Rust. The first program uses traditional threads. The following is its core code:

let mut handles = Vec::new();</code><code>for _ in 0..num_threads {<!-- --></code><code> let handle = thread::spawn (|| {<!-- --></code><code> thread::sleep(Duration::from_secs(10));</code><code> });</code><code> handles .push(handle);</code><code>}</code><code>for handle in handles {<!-- --></code><code> handle.join().unwrap(); </code><code>}

Two other versions use asynchronous programming, one uses tokio and the other uses async-std. The following are the core parts of the tokio version:

let mut tasks = Vec::new();</code><code>for _ in 0..num_tasks {<!-- --></code><code> tasks.push(task:: spawn(async {<!-- --></code><code> time::sleep(Duration::from_secs(10)).await;</code><code> }));</code><code>}</code><code>for task in tasks {<!-- --></code><code> task.await.unwrap();</code><code>}

The async-std variant is very similar, so I won’t quote it here.

GO

In Go language, goroutine is the basic building block of concurrency. Instead of waiting for them individually, we use a WaitGroup:

var wg sync.WaitGroup</code><code>for i := 0; i < numRoutines; i + + {<!-- --></code><code> wg.Add(1)</code><code> go func() {<!-- --></code><code> defer wg.Done()</code><code> time.Sleep(10 * time.Second)</code><code> }()</code><code>}</code><code>wg.Wait()

Java

Java has traditionally used threads, but JDK 21 offers a preview of virtual threads, which are a similar concept to goroutines. So I created two variants of the benchmark. I’m also curious how Java threads compare to Rust’s threads.

List<Thread> threads = new ArrayList<>();</code><code>for (int i = 0; i < numTasks; i ++ ) {<!-- --></code> <code> Thread thread = new Thread(() -> {<!-- --></code><code> try {<!-- --></code><code> Thread.sleep(Duration. ofSeconds(10));</code><code> } catch (InterruptedException e) {<!-- --></code><code> }</code><code> });</code><code> thread.start();</code><code> threads.add(thread);</code><code>}</code><code>for (Thread thread : threads) {<!-- - -></code><code> thread. join();</code><code>}

Here’s a variant with dummy threads. Notice how similar it is! Almost identical!

List<Thread> threads = new ArrayList<>();</code><code>for (int i = 0; i < numTasks; i ++ ) {<!-- --></code> <code> Thread thread = Thread.startVirtualThread(() -> {<!-- --></code><code>try {<!-- --></code><code> Thread.sleep(Duration .ofSeconds(10));</code><code> } catch (InterruptedException e) {<!-- --></code><code> }</code><code> });</code> <code> threads.add(thread);</code><code>}</code><code>for (Thread thread : threads) {<!-- --></code><code> thread.join ();</code><code>}

C#

C# is similar to Rust, with first-class support for async/await: ?

List<Task> tasks = new List<Task>();</code><code>for (int i = 0; i < numTasks; i ++ )</code><code>{<!- - --></code><code> Task task = Task.Run(async () =></code><code> {<!-- --></code><code> await Task.Delay( TimeSpan.FromSeconds(10));</code><code> });</code><code> tasks.Add(task);</code><code>}</code><code>await Task. WhenAll(tasks);

Node.JS

The same goes for Node.JS: ?

const delay = util.promisify(setTimeout);
const tasks = [];

for (let i = 0; i < numTasks; i ++ ) {
    tasks.push(delay(10000);
}

await Promise. all(tasks);

Python

And Python added async/await in 3.5, so we can write like this: ?

async def perform_task():
    await asyncio. sleep(10)


tasks = []

for task_id in range(num_tasks):
    task = asyncio. create_task(perform_task())
    tasks.append(task)

await asyncio. gather(*tasks)

Elixir

Elixir is also known for its asynchronous features: ?

tasks =
    for _ <- 1..num_tasks do
        Task.async(fn ->
            :timer. sleep(10000)
        end)
    end

Task.await_many(tasks, :infinity)

Test environment

Hardware: Intel(R) Xeon(R) CPU E3-1505M v6 @ 3.00GHz
Operating system: Ubuntu 22.04 LTS, Linux p5520 5.15.0-72-generic
Rust version: 1.69
Go version: 1.18.1
Java version: OpenJDK “21-ea” build 21-ea + 22-1890
.NET version: 6.0.116
Node.js version: v12.22.9
Python version: 3.10.6
Elixir version: Erlang/OTP 24 erts-12.2.1, Elixir 1.12.2

All programs run in release mode where available. Other options keep default settings.

Results

Minimum memory usage

Let’s start with some small tests. Since some runtimes need some memory for their own use, we only start one task at first.

Figure 1: Peak memory required to start a task

We can see that the programs are clearly divided into two groups.

Go and Rust programs using static compilation to native binaries require very little memory. And other programs running on a hosting platform or an interpreter consume more memory, although Python performs very well in this case. The difference in memory consumption between these two groups of programs is about an order of magnitude.

To my surprise, .NET was the worst in terms of memory footprint, but I’m guessing this could probably be tuned with some settings. Let me know in the comments if you have any tips. I don’t see much difference between debug mode and release mode.

1000 tasks

Figure 2: Peak memory required to start 10,000 tasks

Here are some surprising results! As you might expect, threads don’t perform well on this benchmark. This is true for Java threads, which actually consume almost 250 MB of memory. But the native Linux threads used in Rust seem to be lightweight enough that at 10,000 threads the memory consumption is still lower than the idle memory consumption of many other runtimes. Asynchronous tasks or virtual (green) threads may be more lightweight than native threads, but with only 10,000 tasks we won’t see that advantage. We need more tasks.

Another surprising result is Go. Goroutines are supposed to be very lightweight, but in reality they consume more than 50% of the memory required by Rust threads. Honestly, I expected Go to have a bigger advantage. So I conclude that at 10,000 concurrent tasks, threads are still a pretty competitive choice. The Linux kernel is definitely doing something right here.

The small advantage that Go had in the previous benchmarks has also disappeared, and its memory consumption is now more than 6 times higher than the best Rust program. It is also surpassed by Python.

One final surprise, .NET’s memory consumption didn’t increase significantly when the number of tasks reached 10,000. Maybe it’s just using pre-allocated memory, or it’s free memory usage is so high that 10,000 tasks doesn’t matter to it.

100,000 tasks

I can’t start 100,000 threads on my system, so benchmarking of threads must be ruled out. It might be possible to tweak it a bit by changing system settings, but after an hour of trying, I gave up. So, at 100,000 tasks, you probably don’t want to use threads.

Figure 3: Peak memory required to start 100,000 tasks

At this point, Go programs are surpassed not only by Rust, but also by Java, C#, and Node.js.

And .NET under Linux might be cheating a bit, as its memory usage still doesn’t increase. 😉 I had to double check that it actually started the correct number of tasks, but it did. It still exits after about 10 seconds, so it doesn’t block the main loop. magic! Great job, .NET.

1 million tasks

Now let’s do a limit test.

At 1 million tasks, Elixir gave up because ** (SystemLimitError) reached the system limit. Other languages still hold on.

Figure 4: Peak memory required to start 1 million tasks

Finally, we see an increase in the memory consumption of the C# program. But it’s still very competitive. It even manages to slightly outperform a Rust runtime!

The gap between Go and other languages has widened. Go is now more than 12 times away from the winner. It also takes 2x more memory than Java, which contradicts the common belief that the JVM is a memory-greedy and that Go is lightweight.

Rust’s tokio remains invulnerable. This is not surprising, after all we have seen its performance at 100,000 tasks.

Summary

As we observed, a large number of concurrent tasks can consume a large amount of memory, even if these tasks do not perform complex operations. Different language runtimes have different trade-offs. Some are lightweight and efficient when processing a small number of tasks, but they do not scale well when processing hundreds of thousands of tasks. In contrast, other runtimes with higher initial overhead can easily handle high loads. It is important to note that not all runtimes can handle very large numbers of concurrent tasks with their default settings.

This comparison only focuses on memory consumption, while other factors such as task startup time and communication speed are equally important. It’s worth noting that at 1 million tasks, I’ve observed that the overhead of task startup becomes noticeable, with most programs taking more than 12 seconds to complete.

For more technical dry goods, please pay attention to the official account “Cloud Native Database“

squids.cn, based on the basic resources of the public cloud, provides RDS on the cloud, cloud backup, cloud migration, SQL window portal enterprise functions,

Help enterprises quickly build a cloud-based database integration ecosystem.

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge Java skill tree concurrencyThe definition of concurrency 118932 people are learning the system