Home  Tech   Apache kafk ...

Apache Kafka's architecture

Apache Kafka's architecture is designed for high throughput, fault tolerance, and scalability, making it a powerful distributed streaming platform. Here's a detailed explanation of Kafka's architecture:

Core Components

  1. Producers:

    • Function: Producers are applications that publish (write) data to Kafka topics.
    • Mechanism: Producers send data to Kafka brokers, specifying the topic to which the data should be published. Producers can also decide which partition within the topic to send the data to, often using a key-based partitioning strategy.
  2. Consumers:

    • Function: Consumers are applications that subscribe to (read) data from Kafka topics.
    • Mechanism: Consumers read data from Kafka topics, processing the messages they receive. Consumers belong to consumer groups, and Kafka ensures that each partition's data is only read by one consumer within each group.
  3. Topics:

    • Function: Topics are logical channels to which producers send records and from which consumers receive records.
    • Mechanism: Topics are divided into partitions to allow for parallel processing and scalability. Each record in a topic is assigned to a partition, which is an ordered, immutable sequence of records.
  4. Partitions:

    • Function: Partitions are a way to parallelize a topic by splitting it into multiple segments.
    • Mechanism: Each partition is an ordered sequence of records and is the unit of parallelism in Kafka. Partitions are distributed across brokers in the Kafka cluster, enabling horizontal scaling and fault tolerance.
  5. Brokers:

    • Function: Brokers are servers that store data and serve client requests.
    • Mechanism: Each broker handles read and write requests for partitions, persists data to disk, and replicates data to ensure fault tolerance. Brokers communicate with each other to coordinate and manage the cluster.
  6. ZooKeeper:

    • Function: ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
    • Mechanism: Kafka uses ZooKeeper to manage the cluster metadata, including information about brokers, topics, and partitions. It also helps with leader election for partitions.

Key Features and Mechanisms

  1. Partitioning:

    • Partitions enable Kafka to distribute data across multiple brokers. Each partition can be hosted on a different broker, allowing Kafka to scale horizontally.
    • Producers can specify which partition to write to, often based on a message key, to ensure that messages with the same key are ordered within the partition.
  2. Replication:

    • Each partition has a configurable number of replicas to ensure data durability and fault tolerance.
    • One replica is designated as the leader, and the others are followers. The leader handles all read and write requests, while followers replicate the data.
    • If the leader fails, a follower is automatically promoted to leader.
  3. Consumer Groups:

    • Consumers are organized into consumer groups. Each consumer in a group reads from a unique subset of the partitions in the topic.
    • Kafka ensures that each partition is consumed by only one consumer within a group, enabling parallel processing of data.
  4. Offset Management:

    • Kafka keeps track of the offset of the last message read by each consumer in a topic. This allows consumers to resume reading from the same point in case of failure.
    • Consumers can commit their offsets either automatically or manually, providing flexibility in handling message processing.
  5. High Throughput and Low Latency:

    • Kafka achieves high throughput and low latency through efficient data writing and reading mechanisms. It uses a combination of in-memory and disk-based storage, along with batching and compression techniques.
  6. Durability and Fault Tolerance:

    • Data is persisted to disk and replicated across multiple brokers to ensure durability and fault tolerance.
    • Kafka can handle broker failures seamlessly by promoting replicas to leaders and redistributing load.

Data Flow in Kafka

  1. Producing Data:

    • Producers send records to a specific topic.
    • The record is assigned to a partition (based on a key or round-robin).
    • The broker that hosts the partition leader receives the record and appends it to the partition log.
  2. Storing Data:

    • The leader broker stores the record on disk and sends it to follower replicas.
    • Followers acknowledge the receipt of the record to the leader.
  3. Consuming Data:

    • Consumers subscribe to topics and request records from partitions.
    • The broker responds with records from the requested partition, starting from the consumer's last committed offset.
    • Consumers process the records and optionally commit their offsets.

Kafka Cluster

Published on: Jun 17, 2024, 11:40 PM  
 

Comments

Add your comment