In-depth analysis of lettuce, why a single connection can also handle high concurrent redis requests

Introduction

What is lettuce

Spring Boot uses Lettuce as the Redis client by default since version 2.0 (Note 1). The Lettuce client is implemented based on Netty’s NIO framework. For most Redis operations, it only needs to maintain a single connection to efficiently support concurrent requests from the business end – this is very different from the Jedis connection pool model. At the same time, Lettuce supports more comprehensive features, and its performance is not inferior to, or even better than, Jedis

The official introduction is this: Lettuce is a scalable thread-safe Redis client that provides synchronous, asynchronous and reactive APIs. Multiple threads can share a connection EXEC if they avoid blocking and transactional operations (such as BLPOP and MULTI/ ). The excellent netty NIO framework can manage multiple connections efficiently. This includes support for advanced Redis features such as Sentinel, Cluster, and the Redis data model.

netty overview

As the underlying framework of Lettuce, in this section we first give a brief introduction to Netty NIO. The book “Netty In Action” mentioned: “From a high-level perspective, Netty is committed to solving the two major issues we are concerned about (network programming field) technology and architecture. First, it is built on Java NIO’s asynchronous And event-driven implementation ensures the maximum performance and scalability of applications under high load; secondly, Netty uses a series of design patterns to decouple program logic from the network layer, thereby simplifying the user’s development process. And ensure the testability, modularity and reusability of the code to the greatest extent.”

The above figure shows the core logic of Netty NIO. NIO is usually understood as the abbreviation of non-blocking I/O, which means non-blocking I/O operations. Channel in the figure represents a connection channel, used to carry connection management and read and write operations; EventLoop is the core abstraction of event processing. An EventLoop can serve multiple Channels, but it will only be bound to a single thread. All I/O events and user tasks in EventLoop are processed on this thread; except for the event listening action of the Selector, the read and write operations on the connection channel are performed in a non-blocking manner – this is the difference between NIO and The important difference between BIO (blocking I/O, that is, blocking I/O) is also the reason why the NIO mode has excellent performance.

Principle

Lettuce implementation principle and Redis pipeline mode

Although a Netty EventLoop can serve multiple socket connections, Lettuce can support most concurrent requests on the business side with only a single Redis connection – that is, Lettuce is thread-safe. This depends on the joint action of the following factors:

Netty’s single EventLoop is only bound to a single thread. Concurrent requests from the business end will be put into the EventLoop’s task queue and eventually processed sequentially by the thread. At the same time, Lettuce itself will also maintain a queue. When it sends instructions to Redis through EventLoop, the successfully sent instructions will be put into the queue; when it receives the response from the server, Lettuce will start from the head of the queue in a FIFO manner. The corresponding instructions are retrieved for subsequent processing.
The Redis server itself is also based on the NIO model and uses a single thread to handle client requests. Although Redis can maintain hundreds or thousands of client connections at the same time, at a certain moment, requests for a certain client connection are processed and responded to in sequence.
The Redis client and server are connected through the TCP protocol, and the TCP protocol itself ensures the order of data transmission.

In this way, Lettuce naturally uses the pipeline mode (pipelining) to interact with Redis on the basis of ensuring the order of request processing – in the case of concurrent requests from multiple business threads, the client does not have to wait for the server’s response to the current request. < mark> to make the next request on the same connection. This accelerates Redis request processing and also efficiently utilizes the full-duplex feature of the TCP connection. In contrast, without explicitly specifying the pipeline mode, Jedis can only continue to use the connection to initiate the next request after processing the response to the current request on a Redis connection – This difference between Lettuce and Jedis is similar to the difference between HTTP/2 and HTTP/1 to some extent. Readers can refer to “Introduction to HTTP/2” for the implementation principles of HTTP/2, which will not be described in detail in this article.

What is pipeline mode

The pipeline mode is discussed in detail in the Redis official website document. The general idea is that the client and the server are connected through the network. No matter whether the network delay between the two is high or low, the data packet goes from the client to the server (request), and then from The process of the server returning the client (response) always takes a certain amount of time. We call this period RTT (Round Trip Time). Assume that under very high-latency network conditions, the RTT reaches 250ms. At this time, even if the server has the ability to process 100k requests per second, the overall QPS (based on a single connection) is only 4. With the help of the pipeline mode, the client can issue a large number (such as 1k) of requests at one time, and then receive a large number of responses from the server at one time, thus significantly improving the request processing speed. As shown below:

lettuce’s pipeline mode

Redis is a TCP server that uses a client-server model and the so-called request/response protocol. This means that typically requests are completed via the following steps:

The client sends a query to the server and reads from the socket (usually in a blocking manner) to get the server response.
The server processes the command and sends the response back to the client.

A request/response server can be implemented so that it is able to handle new requests even if the client has not yet read the old response. This makes it possible to send multiple commands to the server without waiting for a reply, and read the reply in one final step.

With a synchronous API, generally program flow is blocked until the response is completed. The underlying connection is busy sending requests and receiving their responses. In this case, blocking only applies to the current thread's perspective, not the global perspective.

To understand why using the synchronous API does not block at the global level, we need to understand what this means. Lettuce is a non-blocking asynchronous client. It provides a synchronization API to implement blocking behavior on a per-thread basis to create wait (synchronous) command responses. Blocking itself does not affect other threads. Lettuce is designed to run in a pipeline fashion. Multiple threads can share a connection. While one thread can process a command, another thread can send a new command. As soon as the first request returns, the program flow of the first Thread will continue, while the second request will be processed by Redis and returned at a certain point in time.

Lettuce is built on Netty, decoupling reads from writes and providing thread-safe connections. As a result, reading and writing can be handled by different threads, and writing and reading of commands occur independently of each other but in sequence. You can find more details about message ordering in the wiki for command ordering rules in single-threaded and multi-threaded arrangements. The transport and command execution layers do not block processing until a command is written, processed, and its response is read. Lettuce sends the command when it is called.

Asynchronous APIs are a good example. After the command is written to the netty pipeline, each call on the asynchronous API returns a (response handle). A Future written to the pipe does not mean the command was written to the underlying transport. Multiple commands can be written without waiting for a response. Calls to the API (synchronous, asynchronous, and starting with the 4.0 reactive API) can be executed by multiple threads.

Performance comparison between Lettuce and Jedis

The above figure shows how Jedis interacts with Redis. At first glance, it seems that it is not easy to distinguish between the performance of Lettuce and Jedis operating modes in scenarios with high concurrent requests of business threads – The former uses pipeline mode on a single shared connection. Interact with Redis in this way; the latter performs concurrent operations on Redis through the connection pool it maintains. Let’s first analyze it from the perspective of the Redis server. From the perspective of the Redis server, when the client request sending rate is the same, the pipeline interaction method has certain advantages. Here is a quote from the Redis official website document “Using pipelining to speedup Redis queries”:

Pipelining is not just a way to reduce the latency cost associated with the round trip time, it actually greatly improves the number of operations you can perform per second in a given Redis server. This is the result of the fact that, without using pipelining, serving each command is very cheap from the point of view of accessing the data structures and producing the reply, but it is very costly from the point of view of doing the socket I/O. This involves calling the read() and write() syscall, that means going from user land to kernel land. The context switch is a huge speed penalty.

The gist of the above is: the role of the pipeline mode is not only that it reduces the delay impact caused by network RTT, but also significantly increases the number of instruction operations that the Redis server can execute per second. This is because, although the cost of Redis processing a certain instruction operation is very low from the perspective of accessing memory data and generating a response, from the perspective of performing socket I/O operations, if we do not use the pipeline mode, (When a large number of client requests need to be processed one by one) The cost for Redis (relative to memory operations) is very high. Socket I/O operations involve two system calls, read and write, which means that Redis needs to switch (frequently) from user mode to kernel mode, and the resulting context switching will be very time-consuming.

Context switching

According to the introduction in “In-depth Understanding of Computer Systems”, context switching occurs during the kernel’s scheduling of different processes or threads in the system. As far as processes are concerned, the kernel maintains a context for each process, which is used to resume execution of the interrupted process when needed. The process context includes a variety of different objects, such as various registers, program counters, user stacks, kernel stacks, and various kernel data structures (such as address space page tables, file tables), etc. When a program executes a system call in user mode (such as the socket I/O operation mentioned earlier), in order to avoid blocking, the kernel will interrupt the current process through the context switch mechanism and schedule the execution of another (previously interrupted) process. )process. This process includes: 1. Saving the context of the current process, 2. Restoring the saved context of the previously interrupted process, 3. Executing the restored process. Furthermore, even if the system call does not block, the kernel may choose to perform a context switch instead of returning control to the calling process (after the system call completes).

It can be seen that the context switching operation of the process is relatively complicated. For threads running in the same process, since they share the context of the process, and the context of the thread itself is much smaller than the context of the process, the context switching of the thread (in the same process) is smaller than that of the process. Context switching needs to be fast. However, even so, since it also involves the conversion of the program back and forth between user mode and kernel mode, as well as the flushing of CPU data, the performance loss caused by high-intensity thread context switching cannot be ignored. This is why many frameworks and programming languages try to avoid this as much as possible. For example, Netty’s EventLoop follows the Java NIO model and is only bound to a single thread; spin locks are enabled by default in JDK 6 to reduce the cost of thread switching as much as possible; Go language uses goroutine instead of threads to improve program concurrency performance.

Closer to home, although the Redis server itself uses single-threaded, NIO mode to process client requests, compared with the traditional BIO method of one thread serving one client connection, it has achieved considerable optimization in system context switching and memory management, but In high concurrent request scenarios, there is still room for improvement in server performance. According to the discussion in the official website documentation, In the pipeline mode of Redis, a single read system call can read many instructions, and a single write system call can also write back many responses – compared to A read or write system call only processes one client request, which further reduces the context switching overhead when the server processes requests. The number of requests processed by Redis per second will increase nearly linearly as the pipeline lengthens (that is, the number of instructions in the pipeline increases), and can eventually reach about 10 times the processing performance in non-pipeline mode.

Test

Use the JMH (Java Microbenchmark Harness) framework to simulate high concurrent business request scenarios. Based on the localhost local Redis service, perform performance tests on Jedis and Lettuce on multi-core processors (due to limited conditions, the client and server are on the same machine. run, but this has less impact on the reference value of the test data). We used 200 concurrent threads to test the Jedis connection pool mode, Lettuce single connection mode, Lettuce connection pool mode, and Lettuce multi-connection mode (this mode will be further explained later). See the appendix for details of the relevant benchmarking code. The test results are averaged multiple times, as shown in the figure below:

The vertical axis of the above figure represents the usage patterns of various clients we tested, and the horizontal axis represents their corresponding QPS performance data.

First of all, we can see that the Lettuce single connection mode discussed in detail above does not use the parallel capabilities of multi-core processors when interacting with Redis, but with the help of its pipeline characteristics, it also shows good performance with only a single shared connection. level. Jedis showed its optimal performance level when the number of connection pool connections was 50, with QPS reaching about 90k, surpassing Lettuce single connection mode; and when the number of connections increased to 200, which was equal to the number of concurrent business threads used in the test, Performance dropped sharply, falling to the bottom of the rankings

We used the top and ps commands on the command line to observe the CPU usage of Jedis when testing under different number of connections. We can find that when the number of connections is 50, the CPU indicators are at a relatively balanced level; and when the number of connections is 200 At that time, Jedis’ CPU usage increased sharply. At the same time, more than 90% of the CPU time was consumed in the kernel state. Careful analysis shows that since the number of concurrent threads used in our test is 200, when the number of Jedis connection pool connections is also 200, it is equivalent to that at the same time, each thread can hold a connection and interact with Redis. This is somewhat similar to the BIO model where the server allocates a separate thread for processing each client request. In this mode, as the number of concurrent threads increases to a certain level, the performance of the application will drop significantly due to the need to frequently transfer to the kernel state for thread context switching.

Does lettuce need to enable connection pool mode?

Judging from the test data, Lettuce’s overall performance in connection pool mode is at a low level. The reason is that, first of all, we can see that in the connection pool mode, the Lettuce connection is thread confinement – that is, after the business thread obtains the Lettuce connection from the connection pool, it reads Redis through the connection. Write operation, and return the connection to the connection pool after the operation is completed; During this period, other threads cannot obtain the connection. This is actually the same principle as the Jedis connection pool. But the difference between the two is that the Jedis connection is not thread-safe, while the Lettuce connection itself is thread-safe (we have made a detailed analysis of this above). Therefore, for Lettuce, in most cases, the thread closure mechanism of the connection pool is unnecessary. The use of connection pools will prevent the connection from being shared by multiple threads, making it impossible to interact with Redis in a more efficient pipeline mode.

Finally, let’s take a look at Lettuce’s multi-connection mode, which ranks first in benchmarking performance. Although both multi-connection mode and connection pool mode use multiple connections, the difference between the two is that in multi-connection mode, Lettuce connections are not thread-closed, but can be used by multiple business threads at the same time (Lettuce The connection is thread-safe). As shown in the figure below, compared with the connection pool mode of Jedis or Lettuce, the multi-connection mode takes advantage of the ability to interact with Redis in pipeline mode; compared with the single-connection mode of Lettuce, the multi-connection mode takes full advantage of the multi-core processor. Parallel operation capability. In our tests, when the number of connections is set to the number of processor cores (8), Lettuce multi-connection mode can take advantage of both pipeline mode and parallel operations in a relatively balanced manner, thus showing the best performance. level. However, this mode has not yet been integrated by Lettuce, and we hope that subsequent versions will support it.

Tuning

1. Remove the connection pool configuration: Lettuce is thread-safe by design, which is enough for most situations. All Redis user operations are performed single-threaded. Using multiple connections does not have a positive impact on the performance of your application. The use of blocking operations usually goes hand in hand with worker threads acquiring their private connections. The use of Redis transactions is a typical use case for dynamic connection pools, since the number of threads requiring dedicated connections is often dynamic. In other words, the need for dynamic connection pooling is limited. Connection pooling always comes with complexity and maintenance costs.

2. Enable regular refresh of the cluster topology in cluster mode: If adaptive refresh is not enabled, connection abnormalities will occur when the Redis cluster changes.

//Cluster topology refresh ClusterTopologyRefreshOptions topologyRefreshOptions = ClusterTopologyRefreshOptions.builder() .enablePeriodicRefresh(Duration.ofSeconds(30)) .enableAllAdaptiveRefreshTriggers() .build(); ClusterClientOptions clusterClientOptions = ClusterClientOptions.builder() //redis command timeout, the connection will be re-established using new topology information after timeout .timeoutOptions(TimeoutOptions.enabled(Duration.ofSeconds(10))) .topologyRefreshOptions(topologyRefreshOptions) .build(); LettuceClientConfiguration clientConfiguration = LettuceClientConfiguration.builder() .clientResources(clientResources) .clientOptions(clusterClientOptions) .build();

3. Turn off verifying cluster node membership:

Lettuce will locally maintain a copy of the information returned by cluster nodes as a routing table.
validateClusterNodeMembership is a client option of Lettuce. It will check whether the address accessed by a certain command is in the routing table maintained. The default value of this parameter is true, which means the check is turned on.

When Lettuce connects to the Redis cluster and topologyRefreshOptions is not configured, it means that after the routing table changes, the routing table in step 1 will not be updated.

However, the MOVED xxx xxx returned when the route changes tells the client to access the new address of xxx, but because this address is not in the routing table, the detection in step 2 will report an error.

refer to

https://github.com/lettuce-io/lettuce-core/wiki/Pipelining-and-command-flushing

https://baijiahao.baidu.com/s?id=1748466935749639220 & amp;wfr=spider & amp;for=pc