Kafka

Kafka - Cluster Architecture

Kafka - Cluster Architecture

Kafka is based on commit logs and stores key-value pairs or messages that come from Producers. Producers send data to Kafka brokers and they are essentially the publishers of messages. 

A stream of messages is known as a Topic that stores data of a particular category. These Topics may be split into Partitions and within each partition, the messages are indexed using a timestamp.

Each partitioned message contains a unique sequence id named Offset. Partitions have backups known as Replica which are primarily used to prevent loss of data. They do not read or write data. 

The processing of streams occurs through Kafka Streams API which allows consumption of data from Kafka in Java applications and then writes results back to Kafka. 

Kafka runs on clusters with one or more servers. These servers are also known as Brokers and are responsible for the maintenance of the published data. The partitions are distributed across nodes and replicated to brokers. Each broker may contain zero partitions per topic. 

Consumers consume published messages by reading data from brokers and subscribes to topics. Each partition consists of a Leader - the node responsible for all reads and writes for that particular partition. A follower node refers to the node that follows the instructions laid out by the Leader. If a Leaders fails, one of the Followers will take their place.    

This architecture allows Kafka to work efficiently and tolerate faults without downtime. This is why Kafka offers a better replacement to the traditional messaging systems such as Advanced Message Queuing Protocol (AMQP) and Java Message Service (JMS) among others. 

 

A Kafka cluster consists of more than one broker with their respective partitions. While a single Kafka cluster is beneficial for local developments, multiple clusters offer several advantages to ease the work.

These include -

  • Isolation of types of data 

Multiple clusters allow users to easily fetch or segregate data from various brokers.

  • Isolation for security 

Security measures offered by Apache Kafka enable the storage of a large amount of data in separate data centers with individual security requirements. 

  • Multiple Data centers

With multiple data centers, the recovery of data in an event of a disaster is easy. The data is copied by the data centers so that if a crash or any unwanted event happens, the data can be recovered.