[Avg. reading time: 11 minutes]

Apache Kafka — Introduction

What Problem Does Kafka Solve?

When systems need to handle millions of events per second reliably, traditional messaging systems start failing.

  • Data loss
  • Poor scalability
  • No easy replay of events

Kafka is built to solve these problems.


What is Kafka?

Apache Kafka is a distributed event streaming platform designed for:

  • High throughput
  • Fault tolerance
  • Real-time data pipelines

At its core, Kafka is:

  • A distributed commit log
  • A publish-subscribe system
  • A replayable event store

Key Characteristics

  • High Throughput → Millions of messages per second
  • Scalable → Horizontally scalable across brokers
  • Fault-Tolerant → Data replication across servers
  • Durable → Messages persisted and replayable

How Kafka Works

  1. Producer sends a message
  2. Kafka assigns it to a partition
  3. Message gets an offset
  4. Stored in a broker
  5. Consumers read using offsets

Basic Terms

1. Producer

A producer sends data to Kafka.

  • Publishes messages to topics
  • Can:
    • Send to a specific partition
    • Let Kafka decide

Partitioning logic:

  • With key → hash(key) % partitions
  • Without key → round-robin

2. Topic

A topic is a logical stream where messages are stored.

  • Similar to a table or data stream
  • Supports multiple consumers
  • Append-only (no updates/deletes)

3. Message (Record)

A message is the basic unit of data in Kafka.

Structure:

  • Key (optional) → partitioning
  • Value → actual data
  • Timestamp
  • Headers (optional)

Messages are immutable.


4. Key

The key determines how messages are distributed.

  • Same key → same partition
  • Maintains ordering per key

If no key:

  • Kafka uses round-robin distribution

5. Partition

A partition is a subset of a topic.

  • Enables parallelism and scalability
  • Append-only and ordered

Important:

  • Each message has an offset
  • Ordering is guaranteed only within a partition
  • No global ordering across topic

6. Broker

A broker is a Kafka server.

Responsibilities:

  • Receives messages
  • Stores partitions
  • Serves consumers

7. Consumer

A consumer reads messages from topics.

  • Pull-based model
  • Reads using offsets
  • Can replay data

8. Consumer Group

A consumer group is a set of consumers working together.

  • Each partition → only ONE consumer in group
  • Enables parallel processing

Rebalancing:

  • Happens when consumers join/leave
  • Kafka redistributes partitions

9. Offset

An offset is a unique ID for messages in a partition.

  • Starts from 0
  • Incremental and immutable

Types:

  • Current Offset → next to read
  • Committed Offset → last saved

Kafka stores offsets in: __consumer_offsets


10. Batches

A batch is a group of messages sent together.

Benefits:

  • Better network usage
  • Compression
  • Faster I/O

Trade-off:

  • Larger batch → higher latency
  • Smaller batch → lower latency

Brokers, Cluster, and Replication

Broker

  • Single Kafka server
  • Stores partitions

Cluster

  • Multiple brokers working together
  • Provides scalability and fault tolerance

Replication

  • Partitions are replicated across brokers
  • Ensures durability and availability

Message Delivery Semantics

Kafka supports three delivery guarantees:

1. At Most Once

  • No duplicates
  • Possible data loss

2. At Least Once (Default)

  • No data loss
  • Possible duplicates

3. Exactly Once

  • No duplicates
  • No data loss
  • Higher overhead

  • At Most Once → Fast but risky
  • At Least Once → Safe but duplicates
  • Exactly Once → Correct but expensive

Commit Strategies

  • Auto Commit

    • Automatic at intervals
  • Manual Commit

    • Controlled by consumer
    • More reliable

Real-World Use Cases

  • Log aggregation
  • Event-driven microservices
  • Real-time analytics
  • Fraud detection
  • User activity tracking

Summary

Kafka is not just a message queue.

It is a:

  • Distributed log
  • Streaming backbone
  • Real-time data platform

Use Kafka when:

  • Scale matters
  • Reliability matters
  • Real-time processing matters

#kafka #realtimeVer 6.0.25

Last change: 2026-04-21