kafka: partitioning strategy for producers to send messages

In order to increase the scalability of the system, Kafka introduces the concept of partitioning.

The partitioning mechanism in Kafka refers to dividing each topic into multiple partitions (Partitions), and each partition is an ordered set of message logs. Each message under a topic will only be saved in a certain partition, and will not be saved in multiple partitions.

Through this design, data read and write operations can be performed at the granularity of partitions. Each partition of each Broker processes requests independently, thereby achieving load balancing and improving the throughput of the overall system.

A partitioning strategy is an algorithm that determines which partition a producer sends messages to.

1. Default partitioner

Kafka has a data distribution strategy when producing data. The DefaultPartitioner.class class is used by default. This class defines the data distribution strategy.

kafka default partitioner: org.apache.kafka.clients.producer.internals.DefaultPartitioner Website: yii666.com

Using the default partitioner, when the producer creates a message, it decides which partition to send it to based on the parameters:

1.1, sticky partition strategy (polling before 2.4.0) – unspecified partition, key

Kafka Learning (4): Partitioning Strategy for Producers to Send Messages Article address https://www.yii666.com/blog/358380.html

When there is neither a partition value nor a key value, Kafka uses Sticky Partition (sticky partitioner) , will randomly select a partition and use it as long as possible. When the partition’s batch is full or completed, Kafka will randomly select another partition to use (and The partition was different last time).

Sticky Partitioning Strategy will randomly select a partition and stick to that partition as much as possible – the so-called sticky partition.

reason:

Kafka uses a batch processing scheme when sending messages, and distributes them after a batch is reached. However, if a batch of data contains data from different partitions, it cannot be placed into a batch. However, the polling scheme in the old version This will cause a batch of data to be divided into multiple small batches, thus affecting efficiency. Therefore, in the new version, this sticky division strategy is adopted. Article source address https://www.yii666.com/blog/358380.html URL: yii666.com<

For example:

For the first time, partition number 0 is randomly selected, and the current batch of partition number 0 is full (default 16k) or linger.ms When the set time is up, Kafka will randomly use another partition (if it is still 0, it will continue to be randomized).

1.2, hash partition strategy

Kafka Learning (4): Partitioning Strategy for Producers to Send Messages

If the partition value is not specified but there is a key, compare the hash value of key with the topic The partition number is taken as remainder to obtain the partition value. Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions

Note: If the key remains unchanged, the hash value calculated for the same key is a fixed value. If it is a fixed value, this hash modulo is meaningless.
For example:

key1’shashvalue=5, key2’shashvalue< /strong>=6, the number of partitions of topic=2, thenkey1 ** The corresponding value1 is written to partition number 1, and the corresponding value2 of key2 is written to 0< /strong> partition.

1.3. Specify partition strategy

The above two structures will perform data distribution operations through DefaultPartitioner. But after specifying a partition, the DefaultPartitioner.partition() method will not be called.

When partition is specified, the specified value is directly used as the partition value; for example, partition=0, all data is written to partition 0. Article source address: https://www.yii666.com/blog/358380.html

2. Customized partition strategy

The custom partition strategy is implemented in the same way as DefaultPartitioner.

1. Create a class to implement the Partitioner interface.

2. Rewrite the methods in partitioner,

Parameter description of partitioner() method:

Parameter 1: topic

Parameter 2: key value

Parameter 3: key value byte array

Parameter 4: value data

Parameter 5: byte array of value data

Parameter 6: Cluster object

3. Write custom partitioning logic in the partitioner() method and return the partition number.

4. Configure custom partitions in the producer configuration information:

spring.kafka.producer.properties.partitioner.class=Full path to configuration class

Code example:

@Component public class MyPartitioner implements Partitioner { @Override public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) { String msgValues = value.toString(); int partition; if (msgValues.contains("test")){ partition = 0; }else { partition = 1; } return partition; } @Override public void close() { //Nothing to close } @Override public void configure(Map<String, ?> configs) { } }

The knowledge points of the article match the official knowledge archives, and you can further learn relevant knowledge. Cloud native entry-level skills treeContainer orchestration (learning environment k8s)Install kubectl16953 people are learning the system

1. Default partitioner

1.1, sticky partition strategy (polling before 2.4.0) – unspecified partition, key

1.2, hash partition strategy

1.3. Specify partition strategy

2. Customized partition strategy