Fan-Out / Fan-In Pattern
Distribute work to many workers (fan-out) and aggregate results (fan-in): scatter-gather, MapReduce-style processing, and aggregation strategies.
The Core Idea
Fan-Out distributes a single task to multiple workers that process sub-problems in parallel. Fan-In aggregates the results from those workers back into a single unified result. Together, they form the Scatter-Gather pattern: scatter the work, gather the results. This is the messaging equivalent of parallel processing — the same idea behind MapReduce, parallel database query execution, and search engine result merging.
The pattern is deceptively simple conceptually but has many subtle implementation challenges: how do you know when all workers have finished? What if one worker fails? How do you aggregate partial results efficiently? How do you handle stragglers?
Fan-Out / Fan-In Flow
Fan-Out Patterns in Practice
Fan-out manifests in two primary forms depending on what you are distributing:
| Fan-Out Type | What Is Distributed | Example | Mechanism |
|---|---|---|---|
| Notification fan-out | The same event is sent to many recipients | Twitter/X follower notifications, push notifications to millions of users | SNS → SQS per region/shard, Kafka topic partitions, Redis Pub/Sub |
| Work fan-out | A large task is split into sub-tasks processed in parallel | Search across multiple index shards, batch image processing, MapReduce | Task queue (SQS/RabbitMQ) with competing consumers, Kafka partitions |
Notification Fan-Out at Scale: The Twitter Problem
Imagine a user with 10 million followers posts a tweet. Naively, you push a notification to each follower's feed — that is 10 million writes on a single user action. This is the canonical fan-out problem. Two architectural approaches:
| Approach | Description | Pro | Con |
|---|---|---|---|
| Fan-out on write (push model) | Write to every follower's feed at post time via async workers | Reads are instant — pre-computed feed | Expensive for high-follower users (celebrities); write amplification |
| Fan-out on read (pull model) | Compute the feed at read time by merging followed users' timelines | No write amplification | Read is expensive and slow for high-volume users; need caching |
| Hybrid | Fan-out on write for regular users, fan-out on read for celebrities | Best of both worlds | More complex; requires follower-count threshold logic |
Work Fan-Out: Aggregation Challenges
When fans-out distributes work (not just notifications), the fan-in aggregation is the tricky part. The orchestrator needs to know when all sub-tasks are complete before producing the final result.
Common aggregation strategies:
- Counter in shared state: orchestrator writes `expectedCount=N` to Redis; each worker atomically decrements; at 0, trigger final assembly. Fast but requires atomic operations.
- Correlation tracking table: persist each sub-task in a DB with status; query for completion. More durable but slower.
- Saga orchestrator pattern: orchestrator explicitly tracks which sub-tasks have replied and coordinates the fan-in step.
- Promise.all / Future aggregation: in code (not messaging), simply await all parallel futures and collect results.
// Redis-based fan-in coordination
async function fanOutSearch(query: string, shards: string[]): Promise<SearchResult[]> {
const jobId = crypto.randomUUID();
const expectedCount = shards.length;
// Initialize counter
await redis.set(`job:${jobId}:remaining`, expectedCount, "EX", 60);
await redis.set(`job:${jobId}:results`, JSON.stringify([]), "EX", 60);
// Fan-out: dispatch sub-query to each shard worker
for (const shard of shards) {
await queue.publish(`search.shard.${shard}`, { jobId, query });
}
// Wait for fan-in: poll until counter reaches 0
return waitForCompletion(jobId, 30_000 /* 30s timeout */);
}
// Each worker calls this after completing its shard search:
async function workerComplete(jobId: string, partialResults: SearchResult[]) {
const pipeline = redis.pipeline();
// Append results atomically
const current = JSON.parse(await redis.get(`job:${jobId}:results`) ?? "[]");
pipeline.set(`job:${jobId}:results`, JSON.stringify([...current, ...partialResults]), "EX", 60);
pipeline.decr(`job:${jobId}:remaining`);
await pipeline.exec();
}Handling Stragglers and Partial Failures
In any large fan-out, some workers will be slow (stragglers) or fail. Strategies to handle this:
- Timeouts with partial results: after a timeout, return the results you have with a `partial: true` flag. Search engines do this — missing one index shard is better than timing out entirely.
- Speculative execution (hedging): after a delay threshold, send the same sub-task to a second worker. Use whichever finishes first. Google's 'The Tail At Scale' paper popularized this.
- Retry with DLQ: failed sub-tasks go to a Dead Letter Queue for retry; the orchestrator is notified separately.
- Idempotent workers: make sub-task processing idempotent so retries are safe.
The 99th Percentile Problem
In a fan-out of 100 sub-tasks, your total latency is bounded by the slowest worker — the 99th percentile latency of a single worker becomes the median latency of the fan-out. Design for this: set per-worker timeouts aggressively, use speculative execution for critical paths, and prefer partial results over indefinite waits.
AWS SNS + SQS Fan-Out
The canonical AWS fan-out pattern uses SNS as the broadcaster and SQS queues as the per-service durable receivers. One SNS topic connects to multiple SQS queues — each queue is owned by a different downstream service. This gives each consumer its own independent, durable queue while the event is broadcast to all of them simultaneously.
Interview Tip
Fan-out/fan-in appears in almost every large-scale design question: notification systems, search, batch processing, and analytics. For notifications, cover push vs pull model and the celebrity/hotspot problem. For work fan-out, explain how you track completion (counter pattern), handle stragglers (timeouts + partial results), and guarantee exactly-once aggregation (idempotent workers). These details separate strong candidates from average ones.
Practice this pattern
Design a notification system for millions of users using fan-out