What is Apache Kafka?

Kafka is a messaging system that collects and processes extensive amounts of data in real-time, making it a vital integrating component for applications running in a Kubernetes cluster. The efficiency of applications deployed in a cluster can be further augmented with an event-streaming platform such as Apache Kafka.

This in-depth tutorial shows you how to configure a Kafka server on a Kubernetes cluster.

Tutorial on how to set up and run Kafka on Kubernetes.

How Does Apache Kafka Work?

Apache Kafka is based on a publish-subscribe model:

  1. Producers produce messages and publish them to topics.
  2. Kafka categorizes the messages into topics and stores them so that they are immutable.
  3. Consumers subscribe to a specific topic and absorb the messages provided by the producers.

Producers and Consumers in this context represent applications that produce event-driven messages and applications that consume those messages. The messages are stored on Kafka brokers, sorted by user-defined topics.

Basic Kafka cluster architecture.

Zookeeper is an indispensable component of a Kafka configuration. It coordinates Kafka producers, brokers, consumers, and cluster memberships.

Deploy Zookeeper

Kafka cannot function without Zookeeper. The Kafka service keeps restarting until a working Zookeeper deployment is detected.

Deploy Zookeeper beforehand, by creating a YAML file zookeeper.yml. This file starts a service and deployment that schedule Zookeeper pods on a Kubernetes cluster.


Note: You can use Yet Another Markup Language (YAML) to create files suitable for both human users and software tools. Files like the ones presented in this tutorial are readily and freely available on online repositories such as GitHub. The files, in their current form, are not meant to be used in a production environment. Instead, you should adequately edit these files to fit your system's requirements.


Use your preferred text editor to add the following fields to zookeeper.yml:

apiVersion: v1
kind: Service
metadata:
  name: zk-s
  labels:
    app: zk-1
spec:
  ports:
  - name: client
    port: 2181
    protocol: TCP
  - name: follower
    port: 2888
    protocol: TCP
  - name: leader
    port: 3888
    protocol: TCP
  selector:
    app: zk-1
    ---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: zk-deployment-1
spec:
  template:
    metadata:
      labels:
        app: zk-1
    spec:
      containers:
      - name: zk1
        image: bitnami/zookeeper
        ports:
        - containerPort: 2181
        env:
        - name: ZOOKEEPER_ID
          value: "1"
        - name: ZOOKEEPER_SERVER_1
          value: zk1

Run the following command on your Kubernetes cluster to create the definition file:

kubectl create -f zookeeper.yml

Create Kafka Service

We now need to create a Kafka Service definition file. This file manages Kafka Broker deployments by load-balancing new Kafka pods. A basic kafka-service.yml file contains the following elements:

 apiVersion: v1 
 kind: Service 
 metadata:  
   labels:  
     app: kafkaApp 
   name: kafka
 spec:  
   ports:  
     -  
       port: 9092 
       targetPort: 9092
      protocol: TCP
     -  
       port: 2181 
       targetPort: 2181 
   selector:  
     app: kafkaApp 
   type: LoadBalancer

Once you have saved the file, create the service by entering the following command:

kubectl create -f kafka-service.yml

Note: In the above-mentioned Kafka Service definition file, Type is set to LoadBalancer. If you have Kubernetes deployed on bare metal, use MetalLB, a load balancer implementation for bare metal Kubernetes.


Define Kafka Replication Controller

Create an additional .yml file to serve as a replication controller for Kafka. A replication controller file, in our example kafka-repcon.yml, contains the following fields:

---  
 apiVersion: v1 
 kind: ReplicationController 
 metadata:  
   labels:  
     app: kafkaApp 
   name: kafka-repcon 
 spec:  
   replicas: 1
   selector:  
     app: kafkaApp 
   template:  
     metadata:  
       labels:  
         app: kafkaApp 
     spec:  
       containers:  
-
           command:  
             - zookeeper-server-start.sh 
             - /config/zookeeper.properties 
           image: "wurstmeister/kafka" 
           name: zk1 
           ports:  
             -  
               containerPort: 2181

Save the replication controller definition file and create it by using the following command:

kubectl create -f kafka-repcon.yml

Start Kafka Server

The configuration properties for a Kafka server are defined in the config/server.properties file. As we have already configured the Zookeeper server, start the Kafka server with:

kafka-server-start.sh config/server.properties

How to Create Kafka Topic

Kafka has a command-line utility called kafka-topics.sh. Use this utility to create topics on the server. Open a new terminal window and type:

kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic Topic-Name

We created a topic named Topic-Name with a single partition and one replica instance.

How to Start a Kafka Producer

The config/server.properties file contains the broker port id. The broker in the example is listening on port 9092. It is possible to specify the listening port directly using the command line:

kafka-console-producer.sh --topic kafka-on-kubernetes --broker-list localhost:9092 --topic Topic-Name 

Now use the terminal to add several lines of messages.

How to Start a Kafka Consumer

As with the Producer properties, the default Consumer settings are specified in config/consumer.properties file. Open a new terminal window and type the command for consuming messages:

kafka-console-consumer.sh --topic Topic-Name --from-beginning --zookeeper localhost:2181 

The --from-beginning command lists messages chronologically. You are now able to enter messages from the producer’s terminal and see them appearing in the consumer’s terminal.

How to Scale a Kafka Cluster

Use the command terminal and directly administer the Kafka Cluster using kubectl. Enter the following command and scale your Kafka cluster quickly by increasing the number of pods from one (1) to six (6):

kubectl scale rc kafka-rc --replicas=6

Conclusion

By following the instructions in this tutorial, you have successfully installed Kafka on Kubernetes. A single Kafka broker can process an impressive amount of reads and writes from a multitude of clients simultaneously.

If you are deploying applications within a Kubernetes cluster, use Kafka to improve the capacity of your apps to exchange information in real-time.

For alternative message brokers check out our article on deploying RabbitMQ on Kubernetes.


Next you should also read