29 | Service Mesh: How to shield the service governance details of the service-based system?

In the first few courses of the distributed service chapter, we learned about which middleware should be used to solve the communication and service governance issues between services in the process of microservices, including:

Use RPC framework to solve service communication problems;

Use the registration center to solve service registration and discovery problems;

Use distributed Trace middleware to troubleshoot slow requests across service calls;

Use load balancing servers to solve service scalability issues;

Implant service governance strategies such as service circuit breaker, downgrade and flow control into the API gateway.

After going through these steps, your vertical e-commerce system has basically completed the transformation of microservice splitting. However, at present, the language used by your system is still mainly Java. The previously mentioned service governance strategies and communication protocols between services are also implemented using the Java language.

Then there will be a problem: Once several small teams in your team start to try to use Go or PHP to develop new microservices, they will definitely be challenged in the process of microservices .

Challenges brought by cross-language systems

In fact, it is relatively common for different teams in a company to use different development languages. For example, the main development languages of Weibo are Java and PHP. In recent years, there have also been some systems developed using Go. Microservices developed using different languages will have two challenges when calling each other:

On the one hand, the communication protocol between services must be multi-language friendly. To achieve cross-language calls, the key point is to choose an appropriate serialization method. As an example.

For example, if you develop an RPC service in Java, you use Java’s native serialization method. This serialization method is not friendly to other languages. Then, when you use other languages to call this RPC service, it will be difficult to parse the serialized binary stream. So I suggest you, when choosing a serialization protocol, consider whether the serialization protocol is multi-language friendly. For example, you can choose Protobuf or Thrift. In this way, the problem of cross-language service calling can be easily solved.

On the other hand, microservices developed using new languages cannot use previously accumulated service governance strategies. For example, when the RPC client uses the registration center to subscribe to services, in order to avoid interacting with the registration center for every RPC call, the RPC client generally caches node data. If the service node in the registry changes, the node cache of the RPC client will be notified and the cache data will be changed.

Moreover, in order to reduce the access pressure on the registration center, we generally consider using multi-level cache (memory cache and file cache) on the RPC client to ensure the availability of the node cache. At the beginning, these strategies were implemented using the Java language and were encapsulated in the registration center client for use by RPC clients. If a new language is changed, these logics must be implemented in the new language.

In addition, load balancing, circuit breaker degradation, flow control, printing distributed tracing logs, etc., these service governance strategies need to be re-implemented, and using other languages to re-implement these strategies will undoubtedly bring a huge workload, which is also A big pain point in middleware development.

So, how do you shield the details of service governance in a service-oriented architecture, or in other words, how do you make service governance strategies reusable across multiple languages?

You can consider splitting the details of service governance from the RPC client to form a separate proxy layer for deployment. This proxy layer can be implemented using a single language, and all traffic passes through the proxy layer to use the service governance policy. This is an implementation method of “separation of concerns”, also the core idea of Service Mesh.

How Service Mesh works

1. What is Service Mesh?

Service Mesh mainly handles communication between services. Its main implementation form is to deploy an agent program on the application and the host. Generally speaking, we call this agent program “Sidecar”, and the communication between services has also changed from the previous direct connection between the client and the server to the following form:

In this form, the RPC client first sends the data packet to the Sidecar deployed on the same host as itself. After going through service discovery, load balancing, service routing, and flow control in the Sidecar, the data is then sent to the Sidecar of the designated service node. , in the sidecar of the service node, after recording access logs, recording distributed tracing logs, and limiting current, the data is sent to the RPC server.

This method can isolate business code and service governance strategies, sink service governance strategies, and make them become independent basic modules. In this way, not only can cross-language service governance strategies be reused, but these sidecars can also be managed in a unified manner.

Currently, the most mentioned Service Mesh solution in the industry is Istio. Its gameplay is as follows:

It divides components into data plane and control plane. The data plane is the Sidecar I mentioned (Istio uses Envoy as the implementation of Sidecar). The control plane is mainly responsible for the execution of service governance policies. In Istio, it is mainly divided into three parts: Mixer, Pilot and Istio-auth.

You can not understand the role of each part first, just know that they together constitute the service governance system.

However, in Istio, each request needs to go through the control plane, that is, each request needs to call Mixer across the network, which will greatly affect performance.

Therefore, in the Service Mesh solutions open sourced by major domestic manufacturers, they generally only draw on the ideas of Istio’s data plane and control plane, and then implement the service governance strategy into Sidecar. The control plane is only responsible for issuing policies, so that there is no need to Each request passes through the control plane, and the performance will be improved a lot.

2. How to forward traffic to Sidecar

In the implementation of Service Mesh, a major issue is how to introduce Sidecar as a network proxy as imperceptibly as possible. In other words, whether data is flowing in or out, the data packet must be redirected to the Sidecar port. There are generally two implementation ideas:

The first is to use iptables to achieve transparent forwarding of traffic, and Istio uses iptables by default to implement packet forwarding. In order to explain the principle of traffic forwarding more clearly, let’s briefly review what iptables is.

Iptables is a management tool for the firewall software Netfilter in the Linux kernel. It is located in user space and can control the address translation function of Netfilter. There are five chains by default in iptables. You can think of these five chains as the five steps in the data packet flow process, namely PREROUTING, INPUT, FORWARD, OUTPUT and POSTROUTING. The general process of data packet transmission is as follows:

As can be seen from the figure, data packets use the PREROUTING chain as the entry point. When the destination of the data packets is the local machine, they will also flow through the OUTPUT chain. Therefore, we can add some rules to these two chains to redirect data packets. Let me take Istio as an example to show you how to use iptables to implement traffic forwarding.

In Istio, there is a script called “istio-iptables.sh”. This script is executed when Sidecar is initialized. It mainly sets some iptables rules.

I’ve excerpted some key points to illustrate:

//Outflow traffic processing
iptables -t nat -N ISTIO_REDIRECT //Add ISTIO_REDIRECT chain to handle outbound traffic
iptables -t nat -A ISTIO_REDIRECT -p tcp -j REDIRECT --to-port "${PROXY_PORT}" // Redirect traffic to the Sidecar port
iptables -t nat -N ISTIO_OUTPUT // Add ISTIO_OUTPUT chain to handle outbound traffic
iptables -t nat -A OUTPUT -p tcp -j ISTIO_OUTPUT// Redirect the traffic of the OUTPUT chain to the ISTIO_OUTPUT chain
for uid in ${PROXY_UID}; do
    iptables -t nat -A ISTIO_OUTPUT -m owner --uid-owner "${uid}" -j RETURN //Sidecar's own traffic is not forwarded
done
for gid in ${PROXY_GID}; do
    iptables -t nat -A ISTIO_OUTPUT -m owner --gid-owner "${gid}" -j RETURN //Sidecar's own traffic is not forwarded
done
iptables -t nat -A ISTIO_OUTPUT -j ISTIO_REDIRECT //Forward the traffic of the ISTIO_OUTPUT chain to ISTIO_REDIRECT

//Incoming traffic processing
iptables -t nat -N ISTIO_IN_REDIRECT //Add ISTIO_IN_REDIRECT chain to handle incoming traffic
iptables -t nat -A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-port "${PROXY_PORT}" // Redirect incoming traffic to the Sidecar port
iptables -t ${table} -N ISTIO_INBOUND //Add ISTIO_INBOUND chain to handle incoming traffic
iptables -t ${table} -A PREROUTING -p tcp -j ISTIO_INBOUND //Redirect PREROUTING traffic to the ISTIO_INBOUND chain
iptables -t nat -A ISTIO_INBOUND -p tcp --dport "${port}" -j ISTIO_IN_REDIRECT //Redirect the traffic of the specified destination port on the ISTIO_INBOUND chain to the ISTIO_IN_REDIRECT chain

Assuming that the service node is deployed on port 9080 and the port developed by Sidecar is 15001, then the direction of incoming traffic is as follows:

The traffic diagram of outbound traffic is as follows:

The advantage of the Iptables method is that it is completely transparent to the business. The business does not even know that Sidecar exists, which will reduce the time for business access. However, it also has flaws, that is, it will suffer performance losses under high concurrency, so major domestic manufacturers have adopted another approach: lightweight clients.

In this way, the RPC client will know the Sidecar’s deployment port through configuration, and then send the request to call the service to the Sidecar through a lightweight client. Before forwarding the request, Sidecar first implements some service governance strategies. For example, it queries the service node information from the registration center and caches it, and then selects a node from the service nodes using a certain load balancing strategy, etc.

After the request is sent to the sidecar on the server, the access log and distributed tracing log are recorded on the server, and then the request is forwarded to the real service node. Of course, when the service node is started, it will entrust the server Sidecar to register the node with the registration center, and the Sidecar will know the port on which the real service node is deployed. The entire request process is shown in the figure:

Of course, in addition to iptables and lightweight clients, the solution currently being explored is Cilium. This solution can forward requests from the Socket level, which can avoid the performance loss of iptables. Among these solutions, I suggest you use the lightweight client method. Although there will be some modification costs, it is the simplest in implementation and can quickly make Service Mesh available in your project. mid-landing.

Of course, no matter which method is used, you can deploy Sidecar on the call link between the client and the server and let it proxy the incoming and outgoing traffic. In this way, you can use the service governance policies running in the sidecar. As for these strategies, I have learned about them in previous courses (you can review the courses taught in 23 to 26), so I won’t go into details here.

At the same time, I also recommend that you learn about some of the open source Service Mesh frameworks currently in the industry, so that you can have more choices when choosing a solution. Currently, the relatively mature Service Mesh frameworks in the open source field include the following. You can learn more about them by reading their documents as an extension of this lesson.

1. Istio is the most famous framework in the industry. It proposes the concepts of data plane and control plane and is the pioneer of Service Mesh. Its flaw is the performance issue of Mixer just mentioned.

2. Linkerd is the first generation of Service Mesh, written in Scala language. Its disadvantage is its memory usage.

3. SOFAMesh is an open source Service Mesh component of Ant Financial, and Ant Financial has already had large-scale implementation experience.

Course Summary

In this lesson, in order to solve the problem of reusing service governance strategies in cross-language scenarios, understand what Service Mesh is, and how to implement it in actual projects, the key contents required are as follows:

1. Service Mesh is divided into data plane and control plane. The data plane is mainly responsible for data transmission; the control plane is used to control the implantation of service governance policies. For performance reasons, service governance policies are generally implanted into the data plane, and the control plane is responsible for issuing service governance policy data.

2. There are currently two main implementation methods for Sidecar implantation. One is to use iptables to hijack traffic; the other is to implement traffic forwarding through a lightweight client.

Currently, in some major companies, such as Weibo and Ant Financial, Service Mesh has begun to be implemented in a large number of actual projects, and I recommend that you continue to pay attention to this technology. It itself is a technology that separates business from communication infrastructure. If your business encounters the dilemma of service governance in a multi-language environment; if you have legacy services, you need to quickly implant a service governance strategy; if you want to If you want to quickly share your experience in service governance with other teams, Service Mesh is a good choice.