Menu
Course/Message Queues & Streaming/Apache Kafka Deep Dive

Apache Kafka Deep Dive

Kafka architecture: brokers, topics, partitions, consumer groups, offsets, exactly-once semantics, and when Kafka is the right choice.

20 min readHigh interview weight

What Makes Kafka Different

Apache Kafka is not a traditional message queue. It is a distributed commit log — an append-only, ordered, durable sequence of records. Unlike RabbitMQ or SQS where messages are deleted after consumption, Kafka retains messages for a configurable retention period (days, weeks, or forever with tiered storage). Consumers track their own position using offsets. This fundamentally changes the replay and reprocessing story.

Core Architecture

Loading diagram...
Kafka cluster: topics are split into partitions, each partition has one leader broker and N-1 followers. Consumer groups each get their own independent offset.

Topics and Partitions

A Kafka topic is divided into one or more partitions. Each partition is an ordered, immutable log. Producers write to partitions (round-robin by default, or keyed by a partition key). Consumers read from partitions sequentially using an offset — an integer index into the log.

Partitions are the unit of parallelism. With 12 partitions and 12 consumer instances in a group, you get 12x parallel consumption. You cannot have more active consumers than partitions in a group — extra consumers sit idle.

💡

Partition key selection matters

When you provide a partition key (e.g., `userId`, `orderId`), Kafka hashes it to always route records with the same key to the same partition. This guarantees ordering within a key. Without a key, Kafka round-robins across partitions — higher throughput, no per-key ordering.

Consumer Groups and Offsets

A consumer group is a set of consumer instances that share the work of reading a topic. Kafka assigns each partition to exactly one consumer in the group. If an instance fails, Kafka rebalances — redistributing that partition to another instance. Each group independently tracks its own committed offset per partition in an internal Kafka topic called `__consumer_offsets`.

ScenarioPartition CountConsumer InstancesResult
Under-provisioned42Each consumer reads 2 partitions — works fine
Perfectly matched44Each consumer reads 1 partition — maximum parallelism
Over-provisioned484 consumers active, 4 idle — wasted resources

Replication and Durability

Each partition has a replication factor (typically 3). One broker is the leader for that partition — it handles all reads and writes. The other brokers are followers that replicate from the leader. If the leader fails, a follower is elected. The `acks` producer setting controls durability:

acks settingBehaviorRisk
acks=0Fire and forget — no wait for brokerMessage loss on broker failure
acks=1Wait for leader to acknowledgeLoss if leader fails before replication
acks=all (-1)Wait for all in-sync replicas (ISR)Safest — no loss if replicas ≥ 2

Exactly-Once Semantics

Kafka supports idempotent producers (enable with `enable.idempotence=true`) to prevent duplicate writes from retries. For end-to-end exactly-once, Kafka transactions let you atomically write to multiple partitions and commit offsets in a single atomic operation. This enables read-process-write pipelines (Kafka Streams style) with exactly-once guarantees.

java
// Kafka producer with exactly-once semantics (Java)
Properties props = new Properties();
props.put("bootstrap.servers", "kafka:9092");
props.put("enable.idempotence", "true");
props.put("acks", "all");
props.put("transactional.id", "my-transactional-id");

KafkaProducer<String, String> producer = new KafkaProducer<>(props);
producer.initTransactions();

try {
    producer.beginTransaction();
    producer.send(new ProducerRecord<>("output-topic", key, value));
    // Atomically commit processed offset + new record
    producer.sendOffsetsToTransaction(offsets, consumerGroupId);
    producer.commitTransaction();
} catch (Exception e) {
    producer.abortTransaction();
}

Kafka's Retention Model vs Traditional Queue

This is where Kafka fundamentally differs. Traditional queues delete messages after acknowledgment. Kafka retains messages for a configured period (default 7 days). This enables:

  • Replay — Re-process all events from the beginning when you deploy a new consumer service
  • Multiple independent consumers — Each consumer group has its own offset; 10 different teams can read the same topic independently
  • Time-travel debugging — Rewind a consumer to a specific timestamp to diagnose production issues
  • Event sourcing — The Kafka topic IS the source of truth; the database is a projection

When to Choose Kafka

Choose Kafka when...Choose a traditional queue when...
High throughput (millions of events/sec)Moderate throughput is sufficient
Multiple independent consumers need the same eventsSingle consumer per message is fine
Replay and reprocessing are requiredMessages can be discarded after processing
Event sourcing / audit log use caseSimple task queue (jobs, background work)
Stream processing with Kafka StreamsRich routing logic (RabbitMQ exchanges)
⚠️

Kafka is operationally heavy

Kafka requires ZooKeeper (or KRaft in newer versions), careful partition planning, monitoring of consumer lag, and tuning of retention and replication. Managed services (Amazon MSK, Confluent Cloud) significantly reduce this burden, but Kafka is still heavier than SQS or RabbitMQ for simple use cases.

💡

Interview Tip

Kafka comes up constantly in system design interviews. Key points to hit: (1) It's a distributed commit log, not just a queue — messages are retained. (2) Partitions are the unit of parallelism — scale consumers by adding partitions. (3) Consumer groups let multiple independent services read the same events. (4) Use a partition key for ordering guarantees within a key. If asked 'Kafka vs SQS?' — Kafka wins on throughput, replay, and multi-consumer; SQS wins on simplicity and managed operations.

📝

Knowledge Check

5 questions

Test your understanding of this lesson. Score 70% or higher to complete.

Ask about this lesson

Ask anything about Apache Kafka Deep Dive