Basic Apache Pulsar cluster howto
Apache Pulsar is distributed messaging system. I was doing some POC and here are instructions on how to get basic Pulsar cluster going. This was done on CentOS 7.
The official documentation is pretty decent, and the instructions below are distilled from it.
The setup is a three node Zookeeper cluster and three node broker cluster. The nodes are running CentOS 7 and Pulsar version is 2.5.1. First, there should be some DNS records:
zoo1.example.net IN A 10.1.1.1
zoo2.example.net IN A 10.1.1.2
zoo3.example.net IN A 10.1.1.3
broker1.example.net IN A 10.1.1.4
broker2.example.net IN A 10.1.1.5
broker3.example.net IN A 10.1.1.6
pulsar-cl.example.net IN A 10.1.1.4
pulsar-cl.example.net IN A 10.1.1.5
pulsar-cl.example.net IN A 10.1.1.6
The following steps are to be performed on all 6 systems.
Install Java:
[root@zoo1 ~]# yum install java-devel
Next, set $JAVA_HOME for the whole system by creating /etc/profile.d/java.sh with following content:
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
Create service account and lock it:
[root@zoo1 ~]# groupadd -r pulsar
[root@zoo1 ~]# useradd -r -g pulsar -d /opt/pulsar -s /bin/bash -c "Apache Pulsar" pulsar
[root@zoo1 ~]# passwd -l pulsar
Decompress pulsar tarball and create symlink:
[root@zoo1 ~]# cd /opt
[root@zoo1 opt]# tar zxvf apache-pulsar-2.5.1-bin.tar.gz
[root@zoo1 opt]# ln -s apache-pulsar-2.5.1/ pulsar
Steps below need to be performed on the three zookeeper machines:
[root@zoo1 opt]# mkdir -p pulsardata/zookeeper
[root@zoo1 opt]# chown -R pulsar:pulsar pulsardata/
In /opt/pulsar/conf/zookeeper.conf make the following changes:
server.1=zoo1.example.net:2888:3888
server.2=zoo2.example.net:2888:3888
server.3=zoo3.example.net:2888:3888
dataDir=/opt/pulsardata/zookeeper
Now, each zookeeper server needs to have a unique ID. They do not necessarily have to be sequential, so for simplicity I used hostname index in /opt/pulsar/pulsardata/zookeeper/myid:
[root@zoo1 opt]# echo 1 > /opt/pulsar/pulsardata/zookeeper/myid
[root@zoo1 opt]# chown pulsar:pulsar /opt/pulsar/pulsardata/zookeeper/myid
Similarly, on zoo2 I would echo 2 into the myid file, and so on. Next, start zookeeper service. Note, that no systemd units are included in the tarball, so you have to make those yourself.
[root@zoo1 opt]# systemctl enable pulsar.zookeeper
[root@zoo1 opt]# systemctl start pulsar.zookeeper
Finally, initialize the zookeeper cluster. You only need to do this once on one machine in the cluster:
[root@zoo1 opt]# /opt/pulsar/bin/pulsar initialize-cluster-metadata --cluster pulsar-cl --zookeeper zoo1.example.net:2181 --configuration-store zoo1.example.net --web-service-url http://pulsar-cl.example.net:8080 --web-service-url-tls https://pulsar-cl.example.net:8443 --broker-service-url pulsar://pulsar-cl.example.net:6650 --broker-service-url-tls pulsar+ssl://pulsar-cl.example.net:6651
This concludes basic zookeeper setup. Now, onto remaining three broker nodes.
Make datadir for bookkeeper:
[root@broker1 opt]# mkdir -p pulsardata/bookkeeper
[root@broker1 opt]# chown -R pulsar:pulsar pulsardata/
In /opt/pulsar/conf/bookkeeper.conf specify zookeeper servers, optionally enable stateful function and set custom directories:
zkServers=zoo1.example.net:2181,zoo2.example.net:2181,zoo3.inteorute.net:2181
extraServerComponents=org.apache.bookkeeper.stream.server.StreamStorageLifecycleComponent
journalDirectory=/opt/pulsardata/bookkeeper/journal
ledgerDirectories=/opt/pulsardata/bookkeeper/ledgers
Now, you can start bookies, and again systemd units are not included in the Pulsar tarball:
[root@broker1 opt]# systemctl enable pulsar.bookkeeper
[root@broker1 opt]# systemctl start pulsar bookkeeper
Perform sanity check on broker nodes:
[root@broker1 opt]# /opt/pulsar/bin/bookkeeper shell bookiesanity
Finally, configure brokers. Set the following parameters in /opt/pulsar/conf/broker.conf:
zookeeperServers=zoo1.example.net:2181,zoo2.example.net:2181,zoo3.example.net:2181
configurationStoreServers=zoo1.example.net:2181.zoo2.example.net:2181,zoo3.example.net:2181
clusterName=pulsar-cl
functionsWorkerEnabled=true
allowAutoTopicCreation=false
managedLedgerDefaultEnsembleSize=2
managedLedgerDefaultWriteQuorum=2
managedLedgerDefaultAckQuorum=2
Next, verify ports with the ones used during metadata initialization:
brokerServicePort=6650
brokerServicePortTls=6651
webServicePort=8080
webServicePortTls=8443
Enable Pulsar functions in /opt/pulsar/conf/functions_worker.yml:
pulsarFunctionsCluster: pulsar-cl
Finally, start brokers:
[root@broker1 opt]# systemctl enable pulsar.broker
[root@broker1 opt]# systemctl start pulsar.broker
One more thing, configure client utilities by setting the following parameters in client.conf:
brokerServiceUrl=pulsar://pulsar-cl.example.net:6650/
webServiceUrl=http://pulsar-cl.example.net:8080/
This should result in working Pulsar cluster. There is no security or encryption set up. Unfortunately, the official docs are no complete when it comes to securing the individual components using SSL certificates. For now.