Apache Pulsar deployment and construction

1. Deployment planning

Deploying a Pulsar cluster involves the following steps (in order):

  • 1. Deploy a ZooKeeper cluster and initialize the Pulsar cluster metadata.
  • 2. Deploy a Bookkeeper cluster.
  • 3. Deploy one or more Pulsar brokers.
  • 4. Deploy Pulsar manager (optional).

2. Node planning

hostname IP address role port number
zookeeper1 192.168.1.191 zookeeper 2181
zookeeper2 192.168.1.192 zookeeper 2181
zookeeper3 192.168.1.193 zookeeper 2181
bookkeeper1 192.168.1.194 bookeeper 3181
bookeeper2 192.168.1.195 bookeeper 3181
bookeeper3 192.168.1.196 bookeeper 3181
pulsar1 192.168.1.147 broker 8080 (http protocol), 6650 (pulsar protocol)
pulsar2 192.168.1.148 broker 8080 (http protocol), 6650 (pulsar protocol)
pulsar3 192.168.1.149 broker 8080 (http protocol), 6650 (pulsar protocol)
pulsar1 192.168.1.149 pulsar-manager 7750

3. Download the binary package

Download the binary package of the pulsar distribution, which contains the files required by zookeeper, bookkeeper, and pulsar:

wget https://archive.apache.org/dist/pulsar/pulsar-2.7.1/apache-pulsar-2.7.1-bin.tar.gz

After the package download is complete, decompress and enter the decompressed directory:

tar xvzf apache-pulsar-2.7.1-bin.tar.gz
cd apache-pulsar-2.7.1

The decompressed file directory contains the following subdirectories:

directory content
bin Pulsar command-line tools, such as pulsar and pulsar-admin
conf Configuration files, including ZooKeeper, Bookkeeper, Pulsar, etc.
data The directory where Zookeeper and Bookkeeper save data
lib JAR file used by Pulsar
logs log directory

4. Deploy Zookeeper cluster

Modify the Zookeeper configuration file
Modify the conf/zookeeper.conf configuration file of all Zookeeper nodes:

# Set Zookeeper data storage directory.
dataDir=data/zookeeper

# Add a server.N line for each node in the configuration file, where N is the number of the ZooKeeper node.
server.1=192.168.1.191:2888:3888
server.2=192.168.1.192:2888:3888
server.3=192.168.1.193:2888:3888

Configure the unique ID of the node in the cluster in the myid file of each Zookeeper node. The myid file should be placed in the directory specified by dataDir:

# create directory
mkdir -p data/zookeeper
# The ID number of each Zookeeper node cannot be repeated, and corresponds to the number of server.N, in order of 1, 2, 3
echo 1 > data/zookeeper/myid

Start the Zookeeper cluster
Start the Zookeeper service on each Zookeeper node:

bin/pulsar-daemon start zookeeper

Initialize cluster metadata
After the Zookeeper cluster is successfully started, some meta information of the Pulsar cluster needs to be written to each node of the ZooKeeper cluster. Since the data will be synchronized with each other within the ZooKeeper cluster, it is only necessary to write the meta information to one node of the ZooKeeper:

bin/pulsar initialize-cluster-metadata \
  --cluster pulsar-cluster-1 \
  --zookeeper 192.168.1.191:2181\
  --configuration-store 192.168.1.191:2181\
  --web-service-url http://192.168.1.147:8080,192.168.1.148:8080,192.168.1.149:8080\
  --broker-service-url pulsar://192.168.1.147:6650,192.168.1.148:6650,192.168.1.149:6650

The parameters are described as follows:

parameter description
-cluster pulsar cluster name
–zookeeper zookeeper address, only need to include any machine in the zookeer cluster
–configuration-store Configure the storage address, only need to include any machine in the zookeeer cluster
– web-service-url The URL and port of the pulsar cluster web service, the default port is 8080
–broker-service-url The URL of the broker service, used to interact with the brokers in the pulsar cluster, the default port is 6650

5. Deploy Bookkeeper cluster

All persistent data storage in a Pulsar cluster is handled by Bookkeeper.

Modify Bookkeeper configuration file
Modify the conf/bookeeper.conf configuration file of all Bookkeeper nodes, and set the Zookeeper information connected to the Bookkeeper cluster:

zkServers=192.168.1.191:2181,192.168.1.192:2181,192.168.1.193:2181

Start the Bookkeeper cluster
Start the Bookkeeper service on each Bookkeeper node:

bin/pulsar-daemon start bookie

Verify Bookkeeper cluster status
Use the simpletest command of the Bookkeeper shell on any Bookkeeper node to verify that all bookies in the cluster have been started, and 3 is the number of Bookkeeper nodes.

bin/bookkeeper shell simpletest --ensemble 3 --writeQuorum 3 --ackQuorum 3 --numEntries 3

The meaning of the parameters is as follows:

-a,--ackQuorum <arg> Ack quorum size (default 2) When the specified number of bookie ack responses, consider the message written successfully
-e, --ensemble <arg> Ensemble size (default 3) Number of bookie nodes to write data to
-n, --numEntries <arg> Entries to write (default 1000) number of messages in a batch
-w, --writeQuorum <arg> Write quorum size (default 2) number of copies per message

This command will create as many ledgers as bookies on the cluster, write some entries to it, read it, and delete the ledger.

6. Deploy Pulsar cluster

Modify the Pulsar configuration file
Modify the conf/broker.conf configuration file of all Pulsar nodes:

# Configure the zookeeper cluster address connected by pulsar broker
zookeeperServers=192.168.1.191:2181,192.168.1.192:2181,192.168.1.193:2181
configurationStoreServers=192.168.1.191:2181,192.168.1.192:2181,192.168.1.193:2181

# broker data port
brokerServicePort=6650

# broker web service port
webServicePort=8080

# pulsar cluster name, same as configured when zookeeper initialized cluster metadata
clusterName=pulsar-cluster-1

# The number of bookies used when creating a ledger
managedLedgerDefaultEnsembleSize=2

# number of replicas for each message
managedLedgerDefaultWriteQuorum=2

# The number of replica acks to wait for before completing the write operation
managedLedgerDefaultAckQuorum=2

Start the Pulsar cluster
Start the broker on each Pulsar node:

bin/pulsar-daemon start broker

7. The client connects to the Pulsar cluster

Modify the client configuration file
Modify the conf/client.conf file.

# pulsar cluster web service url
webServiceUrl=http://192.168.1.147:8080,192.168.1.148:8080,192.168.1.149:8080

# pulsar service port
# URL for Pulsar Binary Protocol (for produce and consume operations)
brokerServiceUrl=pulsar://192.168.1.147:6650,192.168.1.148:6650,192.168.1.149:6650

Clients produce and consume messages
The consumer subscribes to the topic of pulsar-test using the following command:

  • -n: the number of subscription messages
  • -s: Subscription group name
  • -t: subscription type, with the following values Exclusive, Shared, Failover, Key_Share
bin/pulsar-client consume \
  persistent://public/default/pulsar-test\
  -n 100 \
  -s "consumer-test" \
  -t "Exclusive"

If the –url parameter is not specified and the pulsar cluster connection information is not specified in the conf/client.conf file, the default connection is pulsar://localhost:6650/. You can specify –url pulsar://192.168.1.147:6650 or –url http://192.168.1.147:8080 to interact with the broker.

Open a new terminal, and the producer uses the following command to produce a message to the pulsar-test topic, and the content of the message is “Hello Pulsar”:

  • -n: number of produced messages
  • -m: message content
bin/pulsar-client produce\
  persistent://public/default/pulsar-test\
  -n 1 \
  -m "Hello Pulsar"

In the consumer terminal, you can see that the message has been successfully consumed:

23:20:47.418 [pulsar-client-io-1-1] INFO com.scurrilous.circe.checksum.Crc32cIntChecksum - SSE4.2 CRC32C provider initialized
----- got message -----
key: [null], properties: [], content: Hello Pulsar

8. Deploy Pulsar manager

Pulsar manager is a WebUI tool for managing and monitoring Pulsar clusters. Pulsar manager can manage multiple Pulsar clusters. github address: https://github.com/apache/pulsar-manager

Install Pulsar manager

wget https://dist.apache.org/repos/dist/release/pulsar/pulsar-manager/pulsar-manager-0.2.0/apache-pulsar-manager-0.2.0-bin.tar.gz
tar -zxvf apache-pulsar-manager-0.2.0-bin.tar.gz
cd pulsar-manager
tar -xvf pulsar-manager.tar
cd pulsar-manager
cp -r ../dist ui
./bin/pulsar-manager

Create Pulsar manager account
Create a super administrator account with the user name admin and password apachepulsar:

CSRF_TOKEN=$(curl http://192.168.1.147:7750/pulsar-manager/csrf-token)
curl \
    -H "X-XSRF-TOKEN: $CSRF_TOKEN" \
    -H "Cookie: XSRF-TOKEN=$CSRF_TOKEN;" \
    -H 'Content-Type: application/json' \
    -X PUT http://192.168.1.147:7750/pulsar-manager/users/superuser \
    -d '{"name": "admin", "password": "apachepulsar", "description": "myuser", "email": "chengzw258 @163.com"}'

Pulsar manager interface
Visit http://192.168.1.147:7750/ui/index.html to log in to Pulsar manager:

Click New Environment to add a Pulsar cluster:

After the addition is complete, you can view and set the relevant information of the Pulsar cluster, for example, view topic information:

Visit http://192.168.1.147:7750/bkvm to view bookie information, user name: admin, password: admin.

View ledger information:

9.Perf stress test

Pulsar provides a command-line tool for stress testing, use the following command to generate messages:

  • -r: total number of messages produced per second (all producers)
  • -n: number of producers
  • -s: the size of each message (bytes)
  • Finally keep up with the topic name
bin/pulsar-perf produce -r 100 -n 2 -s 1024 test-perf

# Output content, from left to right:
# The number of messages produced per second: 87.2
# Traffic size per second: 0.7Mb
# The number of failed messages produced per second: 0
# Average latency: 5.478ms
# Median latency: 4.462ms
# 95% of the delays are within 11.262ms
# 99% of the delays are within 25.802ms
# 99.9% of the delays are within 43.757ms
# 99.99% of latency is within 51.956ms
# Maximum delay: 51.956ms

... Throughput produced: 87.2 msg/s --- 0.7 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 5.478 ms - med: 4.642 - 95pct: 11.263 - 99pct: 25.802 - 99.9pct: 43.757 - 99.99pct: 51.956 - Max: 51.956

Consume messages with the following command:

bin/pulsar-perf consume test-perf


# Output content, from left to right:
# Number of messages consumed per second: 100.007
# Traffic size per second: 0.781Mb
# Average latency: 9.273ms
# Median latency: 9ms
# 95% of the delays are within 14ms
# 99% of the delays are within 15ms
# 99.9% of the delays are within 28ms
# 99.99% of latency is within 34ms
# Maximum delay: 34ms
... Throughput received: 100.007 msg/s -- 0.781 Mbit/s --- Latency: mean: 9.273 ms - med: 9 - 95pct: 14 - 99pct: 15 - 99.9pct: 28 - 99.99pct: 34 - Max: 34

In the Pulsar manager interface, you can test-perf this topic. Two producers are producing messages, and one consumer is consuming messages:

View the storage status of topic:

10. Reference link

  • https://livebook.manning.com/book/pulsar-in-action/chapter-1/v-8/1
  • https://pulsar.apache.org/en/
  • https://www.jianshu.com/p/4664de047c71
  • https://mp.weixin.qq.com/s?__biz=MzUyMjkzMjA1Ng== &mid=2247487414 &idx=1 &sn=850ec2ccc4d2847066a98a899bd0ce1f &chksm=f9c51581ceb29c973a8 7c2548c45755225198ecfa2b235abec61623adfcc70c3d381be8cf501 & amp;scene=21# wechat_redirect
  • https://alexstocks.github.io/html/pulsar.html
  • https://tech.meituan.com/2015/01/13/kafka-fs-design-theory.html