The ultimate solution to buried logs–Golang+Gin+Sarama VS Java+SpringWebFlux+ReactorKafka

The ultimate solution to buried logs–Golang + Gin + Sarama VS Java + SpringWebFlux + ReactorKafka

I have written several articles before about OpenResty + lua-kafka-client writing buried data to Kafka, as follows:

Lua writes Nginx request data to Kafka – buried log solution

Python scheduled task executes shell script to cut Nginx logs – use with caution

nginx + lua writes to kafka and reports buffered messages send to kafka err: not found broker

About OpenResty + doujiang24/lua-resty-kafka writing kafka failover simulation test

There are pitfalls in each step above. Some of them were stepped on because they were not capable enough, and some were stepped on to solve a certain problem, and finally stopped for a while. But a new problem arose, this time less urgent but more important.
According to the general script, the above pitfalls have been eliminated, and basically there will be no way to change the service. However, a new problem still arises, that is, the basic image for containerized deployment needs to be upgraded from the original debian10 has been upgraded to debian11. Of course, this is a major version. Minor versions will be upgraded almost every week. During the upgrade, the project team will not be notified for testing, and the operation and maintenance will be upgraded directly. When debian10 was upgrading to debian11, a problem arose that was horrifying to think about. It was that the signature of one of the methods in the zlib upgrade changed, causing our lua The script reported an error, and we discovered this problem, which led to a concern: when the operation and maintenance upgrades the small version, will it upgrade to a certain runtime library we use, causing online problems. After evaluation, it was found that it was very possible, because the operation and maintenance upgrade image is based on an image scanning software. This software often scans out problems with components such as openssl, requiring operation and maintenance to complete the upgrade within a month. This is likely to affect us. Moreover, when we upgraded the kafka server, we found that the kafka-client produced by doujiang24 is difficult to maintain active updates like the community, supports some new kafka features, and it is difficult to ask for help if there are problems (although I got a reply from him last time, but one person It is always lagging behind a group of people responding to questions).

Solution ideas

Idea 1

We who are sensitive to operating system components will be notified during the operation and maintenance upgrade, and we will evaluate whether we need to skip this upgrade. This is difficult to judge because our project team cannot accurately judge which components will definitely affect us. We consider not using it.

Idea 2

Post-process the underlying components that may affect us, such as gzip and aes, and put them in flink behind kafka instead of processing them here in Nginx. This idea can avoid most of the impacts caused by the underlying upgrade, but the Kafka driver upgrade problem cannot be avoided. Consider not using it.

Idea 3

We also have other services that are all made in Java. Officially because of the existence of the JVM layer, we are not afraid of operating system upgrades. Can we implement them in Java to avoid this problem? This idea can solve the above concerns, but the performance needs to be tested. Even if you use NIO, you still need certain resources to achieve the current TPS, because there is still a big gap in memory usage between OpenResty and Java to achieve the same TPS. This idea is retained for further testing.

Idea 4

Is it possible to maintain low resources and high performance, and use an intermediate layer to shield the impact of operating system component upgrades? At this time I thought of golang. This idea is retained for further testing.

Idea 5

The original architecture of this project is OpenResty + Apache Flumed. Can it be restored to this architecture and post-install the components in OpenResty into Flumed? This was also rejected for the following reasons:

If OpenResty and Flumed are deployed in the same container, because the company’s standard monitoring can only monitor one of the processes, if a process hangs, it may not be detected. This problem has been encountered before, and a container has a OpenResty and 5 Flumed processes. One of the Flumed processes hung up for a long time before I knew it.
If OpenResty and Flumed are deployed separately and in different containers, a network disk needs to be mounted. This network disk is unreliable and will be limited by network bandwidth, resulting in poor performance.
Another problem with this idea is how many pieces of data Flumed sets to save and read. If the setting is large, the data will be lost when the container is restarted. If the setting is small, the performance is insufficient. Finding this balance point will consume a lot of time and resources.
Because OpenResty and Flumed have some problems whether they are in the same container or not, consider not using this idea.

Try

Try golang implementation

go 1.20.10
gin 1.9.1
sarama 1.41.3

It took me a few days to implement it, and the preliminary performance test results are as follows:
1 CPU core, 1G memory, 100 concurrency, 5 buried points per request, TPS is 731
The final CPU usage was 47% and the memory usage was 0.93G
When I first discussed this idea with the architect, I just said it was theoretically feasible and I would give it a try, but no one was sure. While collecting information, I came across an article by Zhihu boss You Paiyun [Practical Sharing] Using Go to reconstruct the streaming log gateway. There is a precedent for the successful launch of this idea, and my confidence is greatly increased.

Try Java implementation

Spring Boot 3.x (SpringWebFlux version is not SpringMVC version)
Reactor Kafka 2.x (mainly corresponding to kakfa-server)

1 CPU core, 2G memory, 100 concurrency, 5 buried points per request, TPS is 430
The final CPU usage was 60% and the memory usage was 1.2G

Conclusion

The test results of OpenResty + lua implementation are
1 CPU core, 1G memory, 100 concurrency, 5 buried points per request, TPS is 421
The final CPU usage was 60% and the memory usage was 0.6G

According to the OpenResty solution, the implementation gap between Golang and Java is not particularly large. Golang shows obvious performance advantages, but the company does not do a good job in supporting the Golang project, such as real-time monitoring, basic mirroring, Golang engineers, etc. It is relatively complete for Java projects. At present, it is initially considered that both projects will enter the UAT environment and use special pressure testing machines for pressure testing.

I’ll add the results after I finish.

———————————-20231101 dividing line start———- ————————–
Additional test results:
1C1G 10w data

Concurrency count	TPS-Java implementation	TPS-Go implementation	CPU usage-Java implementation	CPU usage-Go implementation	Memory-Java implementation	Memory-Go implementation
100-First time	706	1016	Peak 100%	Peak 100%	<500M	<35M
100-secondary	1021	1015	Peak 100%	Peak 100%	<500M	<35M
50	954	933	Peak 90%	Peak 90%	<500M	<30M
75	1004	973	Peak 84%	Peak 90%	<500M	<30M

Overall, there is not much difference in TPS. The CPU usage of Go is slightly higher. The Go implementation is an order of magnitude less than the Java implementation in terms of memory, which indeed saves memory.
Considering the orchestration release, the depth of the company’s internal technology stack, monitoring, logging and other peripheral supporting facilities, we chose Java implementation.
If memory resources are tight or you have Golang’s technical depth, you can choose Golang’s implementation.
If the memory resources are acceptable, and the technology stack and peripheral facilities are mainly based on Java, you can choose Java implementation.
———————————-20231101 dividing line end———- ————————–

The following logs the code for Sarama to send messages to Kafka:

var KafkaProduce sarama.AsyncProducer

func InitKafkaConfig() error {<!-- -->
config := sarama.NewConfig()
//Configuration
// Wait for the response from the server after success
config.Producer.RequiredAcks = sarama.WaitForLocal
// Randomly send messages to partition
config.Producer.Partitioner = sarama.NewRandomPartitioner
// Whether to wait for the response after success and failure. This is only useful here if the RequireAcks setting above is not NoReponse.
config.Producer.Return.Successes = true
config.Producer.Return.Errors = true
// KafkaClientIpList is []string type, the value is kafka address + port number, usually 3
client, err := sarama.NewClient(KafkaClientIpList, config)
if err != nil {<!-- -->
return err
}

producer, err := sarama.NewAsyncProducerFromClient(client)
if err != nil {<!-- -->
return err
}
//This must be present, otherwise kafka will not be able to send a certain number of messages.
//The reason is that if you send a message to KafkaProduce.Input(), it will be stored locally and will not actually be sent to kafka.
//The local memory space is used up and it gets stuck.
go func() {<!-- -->
for {<!-- -->
select {<!-- -->
case _ = <-producer.Successes():
case er := <-producer.Errors():
if er != nil {<!-- -->
AccessLogger.Errorf("Produced message failure: %s", er)
}
}
}
}()
KafkaProduce = producer
return nil
}

func DestroyKafkaProducer() {<!-- -->
if KafkaProduce != nil {<!-- -->
KafkaProduce.Close()
}
}

//message sending
func SendKafkaAsyncMessage(msg string, topic string) {<!-- -->

//Write to kafka
KafkaProduce.Input() <- & amp;sarama.ProducerMessage{<!-- -->Topic: topic, Key: nil, Value: sarama.StringEncoder(msg)}
}