1. Deployment planning
Deploying a Pulsar cluster involves the following steps (in order):
- 1. Deploy a ZooKeeper cluster and initialize the Pulsar cluster metadata.
- 2. Deploy a Bookkeeper cluster.
- 3. Deploy one or more Pulsar brokers.
- 4. Deploy Pulsar manager (optional).
2. Node planning
hostname | IP address | role | port number |
---|---|---|---|
zookeeper1 | 192.168.1.191 | zookeeper | 2181 |
zookeeper2 | 192.168.1.192 | zookeeper | 2181 |
zookeeper3 | 192.168.1.193 | zookeeper | 2181 |
bookkeeper1 | 192.168.1.194 | bookeeper | 3181 |
bookeeper2 | 192.168.1.195 | bookeeper | 3181 |
bookeeper3 | 192.168.1.196 | bookeeper | 3181 |
pulsar1 | 192.168.1.147 | broker | 8080 (http protocol), 6650 (pulsar protocol) |
pulsar2 | 192.168.1.148 | broker | 8080 (http protocol), 6650 (pulsar protocol) |
pulsar3 | 192.168.1.149 | broker | 8080 (http protocol), 6650 (pulsar protocol) |
pulsar1 | 192.168.1.149 | pulsar-manager | 7750 |
3. Download the binary package
Download the binary package of the pulsar distribution, which contains the files required by zookeeper, bookkeeper, and pulsar:
wget https://archive.apache.org/dist/pulsar/pulsar-2.7.1/apache-pulsar-2.7.1-bin.tar.gz
After the package download is complete, decompress and enter the decompressed directory:
tar xvzf apache-pulsar-2.7.1-bin.tar.gz cd apache-pulsar-2.7.1
The decompressed file directory contains the following subdirectories:
directory | content |
---|---|
bin | Pulsar command-line tools, such as pulsar and pulsar-admin |
conf | Configuration files, including ZooKeeper, Bookkeeper, Pulsar, etc. |
data | The directory where Zookeeper and Bookkeeper save data |
lib | JAR file used by Pulsar |
logs | log directory |
4. Deploy Zookeeper cluster
Modify the Zookeeper configuration file
Modify the conf/zookeeper.conf configuration file of all Zookeeper nodes:
# Set Zookeeper data storage directory. dataDir=data/zookeeper # Add a server.N line for each node in the configuration file, where N is the number of the ZooKeeper node. server.1=192.168.1.191:2888:3888 server.2=192.168.1.192:2888:3888 server.3=192.168.1.193:2888:3888
Configure the unique ID of the node in the cluster in the myid file of each Zookeeper node. The myid file should be placed in the directory specified by dataDir:
# create directory mkdir -p data/zookeeper # The ID number of each Zookeeper node cannot be repeated, and corresponds to the number of server.N, in order of 1, 2, 3 echo 1 > data/zookeeper/myid
Start the Zookeeper cluster
Start the Zookeeper service on each Zookeeper node:
bin/pulsar-daemon start zookeeper
Initialize cluster metadata
After the Zookeeper cluster is successfully started, some meta information of the Pulsar cluster needs to be written to each node of the ZooKeeper cluster. Since the data will be synchronized with each other within the ZooKeeper cluster, it is only necessary to write the meta information to one node of the ZooKeeper:
bin/pulsar initialize-cluster-metadata \ --cluster pulsar-cluster-1 \ --zookeeper 192.168.1.191:2181\ --configuration-store 192.168.1.191:2181\ --web-service-url http://192.168.1.147:8080,192.168.1.148:8080,192.168.1.149:8080\ --broker-service-url pulsar://192.168.1.147:6650,192.168.1.148:6650,192.168.1.149:6650
The parameters are described as follows:
parameter | description |
---|---|
-cluster | pulsar cluster name |
–zookeeper | zookeeper address, only need to include any machine in the zookeer cluster |
–configuration-store | Configure the storage address, only need to include any machine in the zookeeer cluster |
– web-service-url | The URL and port of the pulsar cluster web service, the default port is 8080 |
–broker-service-url | The URL of the broker service, used to interact with the brokers in the pulsar cluster, the default port is 6650 |
5. Deploy Bookkeeper cluster
All persistent data storage in a Pulsar cluster is handled by Bookkeeper.
Modify Bookkeeper configuration file
Modify the conf/bookeeper.conf configuration file of all Bookkeeper nodes, and set the Zookeeper information connected to the Bookkeeper cluster:
zkServers=192.168.1.191:2181,192.168.1.192:2181,192.168.1.193:2181
Start the Bookkeeper cluster
Start the Bookkeeper service on each Bookkeeper node:
bin/pulsar-daemon start bookie
Verify Bookkeeper cluster status
Use the simpletest command of the Bookkeeper shell on any Bookkeeper node to verify that all bookies in the cluster have been started, and 3 is the number of Bookkeeper nodes.
bin/bookkeeper shell simpletest --ensemble 3 --writeQuorum 3 --ackQuorum 3 --numEntries 3
The meaning of the parameters is as follows:
-a,--ackQuorum <arg> Ack quorum size (default 2) When the specified number of bookie ack responses, consider the message written successfully -e, --ensemble <arg> Ensemble size (default 3) Number of bookie nodes to write data to -n, --numEntries <arg> Entries to write (default 1000) number of messages in a batch -w, --writeQuorum <arg> Write quorum size (default 2) number of copies per message
This command will create as many ledgers as bookies on the cluster, write some entries to it, read it, and delete the ledger.
6. Deploy Pulsar cluster
Modify the Pulsar configuration file
Modify the conf/broker.conf configuration file of all Pulsar nodes:
# Configure the zookeeper cluster address connected by pulsar broker zookeeperServers=192.168.1.191:2181,192.168.1.192:2181,192.168.1.193:2181 configurationStoreServers=192.168.1.191:2181,192.168.1.192:2181,192.168.1.193:2181 # broker data port brokerServicePort=6650 # broker web service port webServicePort=8080 # pulsar cluster name, same as configured when zookeeper initialized cluster metadata clusterName=pulsar-cluster-1 # The number of bookies used when creating a ledger managedLedgerDefaultEnsembleSize=2 # number of replicas for each message managedLedgerDefaultWriteQuorum=2 # The number of replica acks to wait for before completing the write operation managedLedgerDefaultAckQuorum=2
Start the Pulsar cluster
Start the broker on each Pulsar node:
bin/pulsar-daemon start broker
7. The client connects to the Pulsar cluster
Modify the client configuration file
Modify the conf/client.conf file.
# pulsar cluster web service url webServiceUrl=http://192.168.1.147:8080,192.168.1.148:8080,192.168.1.149:8080 # pulsar service port # URL for Pulsar Binary Protocol (for produce and consume operations) brokerServiceUrl=pulsar://192.168.1.147:6650,192.168.1.148:6650,192.168.1.149:6650
Clients produce and consume messages
The consumer subscribes to the topic of pulsar-test using the following command:
- -n: the number of subscription messages
- -s: Subscription group name
- -t: subscription type, with the following values Exclusive, Shared, Failover, Key_Share
bin/pulsar-client consume \ persistent://public/default/pulsar-test\ -n 100 \ -s "consumer-test" \ -t "Exclusive"
If the –url parameter is not specified and the pulsar cluster connection information is not specified in the conf/client.conf file, the default connection is pulsar://localhost:6650/. You can specify –url pulsar://192.168.1.147:6650 or –url http://192.168.1.147:8080 to interact with the broker.
Open a new terminal, and the producer uses the following command to produce a message to the pulsar-test topic, and the content of the message is “Hello Pulsar”:
- -n: number of produced messages
- -m: message content
bin/pulsar-client produce\ persistent://public/default/pulsar-test\ -n 1 \ -m "Hello Pulsar"
In the consumer terminal, you can see that the message has been successfully consumed:
23:20:47.418 [pulsar-client-io-1-1] INFO com.scurrilous.circe.checksum.Crc32cIntChecksum - SSE4.2 CRC32C provider initialized ----- got message ----- key: [null], properties: [], content: Hello Pulsar
8. Deploy Pulsar manager
Pulsar manager is a WebUI tool for managing and monitoring Pulsar clusters. Pulsar manager can manage multiple Pulsar clusters. github address: https://github.com/apache/pulsar-manager
Install Pulsar manager
wget https://dist.apache.org/repos/dist/release/pulsar/pulsar-manager/pulsar-manager-0.2.0/apache-pulsar-manager-0.2.0-bin.tar.gz tar -zxvf apache-pulsar-manager-0.2.0-bin.tar.gz cd pulsar-manager tar -xvf pulsar-manager.tar cd pulsar-manager cp -r ../dist ui ./bin/pulsar-manager
Create Pulsar manager account
Create a super administrator account with the user name admin and password apachepulsar:
CSRF_TOKEN=$(curl http://192.168.1.147:7750/pulsar-manager/csrf-token) curl \ -H "X-XSRF-TOKEN: $CSRF_TOKEN" \ -H "Cookie: XSRF-TOKEN=$CSRF_TOKEN;" \ -H 'Content-Type: application/json' \ -X PUT http://192.168.1.147:7750/pulsar-manager/users/superuser \ -d '{"name": "admin", "password": "apachepulsar", "description": "myuser", "email": "chengzw258 @163.com"}'
Pulsar manager interface
Visit http://192.168.1.147:7750/ui/index.html to log in to Pulsar manager:
Click New Environment to add a Pulsar cluster:
After the addition is complete, you can view and set the relevant information of the Pulsar cluster, for example, view topic information:
Visit http://192.168.1.147:7750/bkvm to view bookie information, user name: admin, password: admin.
View ledger information:
9.Perf stress test
Pulsar provides a command-line tool for stress testing, use the following command to generate messages:
- -r: total number of messages produced per second (all producers)
- -n: number of producers
- -s: the size of each message (bytes)
- Finally keep up with the topic name
bin/pulsar-perf produce -r 100 -n 2 -s 1024 test-perf # Output content, from left to right: # The number of messages produced per second: 87.2 # Traffic size per second: 0.7Mb # The number of failed messages produced per second: 0 # Average latency: 5.478ms # Median latency: 4.462ms # 95% of the delays are within 11.262ms # 99% of the delays are within 25.802ms # 99.9% of the delays are within 43.757ms # 99.99% of latency is within 51.956ms # Maximum delay: 51.956ms ... Throughput produced: 87.2 msg/s --- 0.7 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 5.478 ms - med: 4.642 - 95pct: 11.263 - 99pct: 25.802 - 99.9pct: 43.757 - 99.99pct: 51.956 - Max: 51.956
Consume messages with the following command:
bin/pulsar-perf consume test-perf # Output content, from left to right: # Number of messages consumed per second: 100.007 # Traffic size per second: 0.781Mb # Average latency: 9.273ms # Median latency: 9ms # 95% of the delays are within 14ms # 99% of the delays are within 15ms # 99.9% of the delays are within 28ms # 99.99% of latency is within 34ms # Maximum delay: 34ms ... Throughput received: 100.007 msg/s -- 0.781 Mbit/s --- Latency: mean: 9.273 ms - med: 9 - 95pct: 14 - 99pct: 15 - 99.9pct: 28 - 99.99pct: 34 - Max: 34
In the Pulsar manager interface, you can test-perf this topic. Two producers are producing messages, and one consumer is consuming messages:
View the storage status of topic:
10. Reference link
- https://livebook.manning.com/book/pulsar-in-action/chapter-1/v-8/1
- https://pulsar.apache.org/en/
- https://www.jianshu.com/p/4664de047c71
- https://mp.weixin.qq.com/s?__biz=MzUyMjkzMjA1Ng== &mid=2247487414 &idx=1 &sn=850ec2ccc4d2847066a98a899bd0ce1f &chksm=f9c51581ceb29c973a8 7c2548c45755225198ecfa2b235abec61623adfcc70c3d381be8cf501 & amp;scene=21# wechat_redirect
- https://alexstocks.github.io/html/pulsar.html
- https://tech.meituan.com/2015/01/13/kafka-fs-design-theory.html