Course/Foundations/Back-of-the-Envelope Estimation

Back-of-the-Envelope Estimation

Master the art of quick estimation: latency numbers, storage calculations, QPS estimates, and the systematic approach interviewers expect.

15 min readHigh interview weight

Why Estimation Matters

Back-of-the-envelope estimation is the practice of making rough, order-of-magnitude calculations to inform design decisions. It answers questions like: 'Do I need sharding?', 'Will a single cache node suffice?', 'How much bandwidth does this feature require?'. Without estimation, design decisions are guesses; with estimation, they are informed engineering choices.

In interviews, estimation serves a dual purpose: it demonstrates quantitative thinking, and it reveals the scale the design must handle — which in turn drives every architectural choice that follows.

💡

Interview Tip

Always do estimation explicitly and verbally in your interview, even if the interviewer hasn't asked for it. After the clarification phase, say: 'Let me quickly estimate the scale before I design.' Then derive QPS, storage, and bandwidth. This shows you're a quantitative thinker and reveals whether your design actually needs a caching layer, sharding, a CDN, etc.

Essential Reference Numbers

Memorize these numbers. They are the raw material of every estimation. Jeff Dean (Google Fellow) published a famous set of 'Numbers Everyone Should Know' that have become industry standard references:

Operation	Latency	Notes
L1 cache hit	~1 ns	Fastest possible; data is in CPU cache
L2 cache hit	~10 ns	Still on-chip but slower
Main memory (RAM) access	~100 ns	100x slower than L1
SSD random read (NVMe)	~100 µs	~1000x slower than RAM
HDD random read (seek)	~10 ms	Mechanical disk: avoid random I/O
Network in same datacenter	~500 µs	Intra-DC; predictable
Network round-trip (cross-continent)	~150 ms	Speed of light + routing
Network round-trip (global)	~300 ms	Worst case; drives CDN decisions

Unit	Value	Notes
1 KB	1,000 bytes	A tweet, a short email
1 MB	1,000,000 bytes	A compressed photo, 1 min of audio
1 GB	1,000,000,000 bytes	A feature film (compressed), 1000 photos
1 TB	1,000,000,000,000 bytes	A hard drive, ~1M photos
1 PB	1,000 TB	Large data warehouse, CDN cache
Seconds in a day	86,400 s	Use 100,000 for quick math (~15% error)
Seconds in a year	~31.5 million s	Use 30M for quick math

Converting Traffic to QPS

Requests Per Second (QPS, or RPS) is the foundational metric. Most requirements are stated in daily active users (DAU) or monthly active users (MAU). Here is the conversion:

text

# QPS estimation formula:
# Average QPS = Total requests per day / 86,400 seconds

# Useful shortcuts:
# 1M requests/day    ≈ 12 QPS
# 10M requests/day   ≈ 116 QPS
# 100M requests/day  ≈ 1,160 QPS
# 1B requests/day    ≈ 11,600 QPS

# Peak QPS rule of thumb:
# Peak is typically 2x–3x average (traffic is not uniform)
# Peak QPS ≈ Average QPS × 3

# Example: Twitter with 300M DAU, 5 tweets viewed per user per minute
# Reads per minute = 300M × 5 = 1.5B reads/min
# Reads per second (average) = 1.5B / 60 = 25M QPS
# Peak QPS ≈ 25M × 3 = 75M QPS  (major events like New Year)

Storage Estimation

Storage grows over time. Always estimate how much new data is created per day and multiply by your retention horizon to get total storage requirements.

text

# Storage estimation formula:
# Daily storage = (Daily writes) × (Average object size)
# Total storage = Daily storage × Retention period (days)
# Add replication factor (usually 3x for distributed systems)

# Example: URL shortener
# - 100M new URLs per day
# - Each URL record: shortcode (7 bytes) + long URL (100 bytes avg) + metadata (50 bytes) = ~200 bytes
# - Daily storage = 100M × 200 bytes = 20 GB/day
# - 5-year retention = 20 GB × 365 × 5 = 36.5 TB raw
# - With 3x replication = 36.5 TB × 3 = ~110 TB total

# Example: Image hosting (Instagram scale)
# - 100M new photos per day
# - Average photo: 3 MB original, 500 KB thumbnail, 100 KB micro-thumbnail
# - Per photo storage: 3.6 MB total
# - Daily storage: 100M × 3.6 MB = 360 TB/day
# - Annual: 360 TB × 365 = ~131 PB/year (→ CDN + object storage required)

Bandwidth Estimation

Bandwidth is storage-per-second. High bandwidth requirements drive CDN usage, content compression, and chunked streaming architectures.

text

# Bandwidth estimation formula:
# Inbound bandwidth = QPS_writes × Average object size
# Outbound bandwidth = QPS_reads × Average object size

# Example: Video streaming (Netflix scale)
# - 200M concurrent streams during peak evening
# - Average video bitrate: 5 Mbps (HD)
# - Total outbound bandwidth: 200M × 5 Mbps = 1,000,000,000 Mbps = 1 Exabit/second
# → This is why Netflix uses a CDN (Open Connect) with servers inside ISPs.
#   Serving from a central datacenter at 1 Ebit/s is not feasible.

# Useful bandwidth benchmarks:
# Standard 1G NIC: 1 Gbps = 125 MB/s
# A single 10G NIC: 10 Gbps = 1.25 GB/s
# If your bandwidth calculation exceeds 100 Gbps, you need a CDN.

A Systematic Estimation Template

Use this template for every estimation. Going through it methodically takes 2-3 minutes and produces the numbers that drive every subsequent design decision.

State your assumptions — DAU, requests per user per day, read/write ratio, average object size.
Calculate write QPS — New data created per second.
Calculate read QPS — Data read per second (often 10x–100x write QPS).
Calculate daily storage growth — Write QPS × seconds/day × object size.
Calculate total storage — Daily growth × retention period × replication factor.
Calculate bandwidth — Read QPS × object size for outbound, write QPS × object size for inbound.
Draw conclusions — Does this need a CDN? Sharding? A specialized storage tier?

📌

Full estimation example: Pastebin

Assumptions: 10M DAU, average 1 paste created per user per week, 10 reads per paste per day, average paste = 10 KB. Write QPS = (10M × 1/7) / 86,400 ≈ 17 QPS. Read QPS = 10M × 10 / 86,400 ≈ 1,160 QPS. Read/write ratio ≈ 68:1 — very read-heavy. Daily storage = 17 × 86,400 × 10 KB ≈ 14.7 GB/day. 10-year storage = 14.7 × 365 × 10 ≈ 53.7 TB. With 3x replication: ~160 TB. Conclusion: A single database can handle write QPS; read-heavy traffic benefits from a caching layer; object storage for paste content; no CDN needed at this scale.

Common Estimation Pitfalls

Ignoring read/write ratio — Most systems are 100:1 or more read-heavy. Always ask 'what is the read/write ratio?' — it determines whether caching or write-optimized storage is the priority.
Forgetting replication — A 3x replication factor triples storage costs. Always multiply raw storage by the replication factor.
Forgetting peak vs average — Provision for peak traffic (3x average), not average. A system that handles average load but buckles on New Year's Eve is not a well-designed system.
Over-precision — Estimating 578.7 QPS instead of 'roughly 600 QPS' wastes time and implies false precision. Round aggressively and move on.
Ignoring the metadata — Stored objects have metadata (creation time, owner, tags, permissions) that can be significant. Don't forget to account for it.

💡

Interview Tip

Write your estimates in a structured way during the interview, not just mentally. Show the formula, plug in numbers, and state the conclusion: '17 QPS for writes — that's well within a single database. 1,160 QPS for reads — we'll need a cache here.' Writing it down creates a reference for both you and the interviewer, and prevents you from losing your place. Use round numbers: 86,400 seconds per day is fine as 100,000 for quick multiplication.

Practice this pattern

Design a system that handles 1 billion daily active users

SLAs, SLOs & SLIs

DNS & Domain Resolution

Back-of-the-Envelope Estimation

Why Estimation Matters

Essential Reference Numbers

Converting Traffic to QPS

Storage Estimation

Bandwidth Estimation

A Systematic Estimation Template

Common Estimation Pitfalls

Comments

Knowledge Check