Transactional Outbox Pattern
Guarantee message delivery alongside database writes: the outbox table, polling publisher, log-tailing approach, and exactly-once semantics.
The Dual-Write Problem
A common pattern in microservices: a service updates its database and then publishes an event to a message broker. But this is a dual write — two separate operations with no atomicity between them. If the database write succeeds but the message publish fails (or the service crashes between the two), you have inconsistency: the state changed but nobody was notified.
Conversely, if you publish the message first and then the database write fails, downstream services react to an event that was never committed. The Transactional Outbox Pattern solves this problem by making the message publication part of the same database transaction as the state change.
Never Dual-Write in Production
The temptation to do `db.save(entity)` then `kafka.publish(event)` in sequence is everywhere. But any crash, network timeout, or broker outage between the two leaves your system inconsistent. The Transactional Outbox is the production-grade solution.
The Outbox Pattern
The pattern introduces an `outbox` table in the same database as the business data. When a service writes a state change, it also inserts a record into the `outbox` table in the same transaction. A separate background process (the Message Relay) reads unpublished outbox records and publishes them to the message broker, then marks them as published.
The Outbox Table Schema
-- Outbox table
CREATE TABLE outbox (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
aggregate_id VARCHAR(255) NOT NULL, -- e.g. orderId
aggregate_type VARCHAR(255) NOT NULL, -- e.g. 'Order'
event_type VARCHAR(255) NOT NULL, -- e.g. 'OrderPlaced'
payload JSONB NOT NULL, -- full event payload
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
published_at TIMESTAMPTZ -- NULL means unpublished
);
-- Service code: write in a single transaction
BEGIN;
INSERT INTO orders (id, user_id, status, ...) VALUES (...);
INSERT INTO outbox (aggregate_id, aggregate_type, event_type, payload)
VALUES ('order-123', 'Order', 'OrderPlaced', '{"orderId": "order-123", ...}');
COMMIT;Message Relay Implementations
There are two main ways to implement the message relay:
- Polling Publisher — A background job polls the outbox table on a schedule (e.g., every 100ms), finds unpublished rows, publishes them to the broker, and marks them as published. Simple to implement, but introduces latency proportional to the polling interval and can create DB load.
- Log-Tailing (CDC) — Use Change Data Capture to watch the database's transaction log (WAL in Postgres, binlog in MySQL) for inserts into the outbox table. The CDC tool (e.g., Debezium) captures these changes and publishes them to Kafka in near-real time. Zero polling overhead and sub-second latency, but requires CDC infrastructure.
At-Least-Once and Idempotency
The outbox pattern guarantees at-least-once delivery — the relay may publish the same message more than once if it crashes after publishing but before marking it as published. Therefore, consumers must be idempotent. Each event should carry a unique ID; consumers track processed IDs and skip duplicates. True exactly-once requires coordination at both the producer and consumer level (e.g., Kafka transactional producers + consumer idempotency).
Outbox Libraries
You rarely need to implement the outbox from scratch. Debezium handles the CDC relay. Transactional Outbox libraries exist for Spring Boot (with Outbox extension), .NET, and Go. The Outbox connector for Debezium automatically routes events to per-aggregate Kafka topics, making it a production-ready solution.
Interview Tip
The dual-write problem is a classic interview trap — many candidates propose updating the DB and publishing to Kafka in two steps without realizing the failure window. When you proactively identify this problem and propose the Transactional Outbox, it signals production experience. Describe both the polling publisher (simpler, higher latency) and the CDC/Debezium approach (more infrastructure, near-real-time), and mention the at-least-once delivery requirement for consumer idempotency.