Design a URL Shortener
Classic interview problem: hash generation, collision handling, redirect performance, analytics tracking, and scaling to billions of URLs.
Problem Statement
A URL shortener takes a long URL and returns a short alias — for example, turning `https://www.example.com/very/long/path?query=foo` into `https://sho.rt/aB3xQz`. When a user visits the short URL, the service redirects them to the original. This is one of the most frequently asked interview problems because it touches hashing, database design, caching, and high-throughput read optimization all in one compact problem.
Requirements
Functional Requirements
- Given a long URL, generate a unique short URL (alias).
- Redirect users from the short URL to the original long URL.
- Support custom aliases (e.g., `sho.rt/my-brand`).
- Optionally set an expiry time on short URLs.
- Track click analytics (count, geo, device, referrer).
Non-Functional Requirements
- High availability: redirects must succeed even if part of the system is down.
- Low latency: redirect should complete in < 10 ms at the 99th percentile.
- Durability: URLs must not be lost once created.
- Scale: support 100 M new URLs/day, 10 B redirects/day.
Capacity Estimation
| Metric | Assumption | Result |
|---|---|---|
| New URLs/day | 100 M | ~1,160 writes/sec |
| Redirects/day | 10 B (100:1 read/write) | ~115,740 reads/sec |
| Storage per URL | 500 bytes (URL + metadata) | |
| Storage for 5 years | 100 M × 365 × 5 × 500 B | ~91 TB |
| Cache size (20% hot) | 10 B × 0.2 × 500 B/day | ~1 TB/day hot set |
Estimation Tip
Always separate read and write QPS. URL shorteners are extremely read-heavy (100:1 ratio is typical). This drives you toward read replicas, aggressive caching, and a CDN-first redirect strategy.
High-Level Design
Short Code Generation
The core challenge is generating a unique, compact, URL-safe identifier for each long URL. There are three common approaches:
| Approach | How It Works | Pros | Cons |
|---|---|---|---|
| MD5 / SHA-256 Truncation | Hash the long URL, take first 7 chars | Stateless, fast | Collision possible; same URL gets same code |
| Base62 Auto-Increment ID | Encode DB auto-increment ID in base-62 | No collisions, predictable length | Exposes sequential IDs; coordination needed |
| Snowflake / UUID + Base62 | Distributed unique ID generator | No coordination, globally unique | Slightly more complex infrastructure |
Recommended approach: use a distributed ID generator (Snowflake-style) and encode the ID in base-62. A 64-bit ID encoded in base-62 produces a 7–11 character code that is collision-free and opaque. For custom aliases, store them directly and check for conflicts at write time.
import string
ALPHABET = string.ascii_lowercase + string.ascii_uppercase + string.digits # 62 chars
def to_base62(num: int) -> str:
if num == 0:
return ALPHABET[0]
result = []
while num:
result.append(ALPHABET[num % 62])
num //= 62
return "".join(reversed(result))
# Example: ID 1_000_000_000 → "15ftgG" (6 chars)
print(to_base62(1_000_000_000)) # "15FtGG" (approx)API Design
| Method | Endpoint | Request Body | Response |
|---|---|---|---|
| POST | /api/v1/urls | `{ longUrl, customAlias?, expiresAt? }` | `201 { shortCode, shortUrl, expiresAt }` |
| GET | /:shortCode | — | `301/302 Location: <longUrl>` |
| GET | /api/v1/urls/:shortCode/stats | — | `200 { clicks, geo, devices }` |
| DELETE | /api/v1/urls/:shortCode | — | `204 No Content` |
301 vs 302 Redirect
Use 302 Found (temporary) if you want to track every click server-side — browsers won't cache it. Use 301 Moved Permanently if you want browsers and CDNs to cache it, reducing origin load. Most analytics-focused URL shorteners use 302.
Key Flow: Redirect
Database Schema
CREATE TABLE urls (
id BIGINT PRIMARY KEY, -- Snowflake ID
short_code VARCHAR(16) UNIQUE NOT NULL, -- base-62 encoded ID or custom alias
long_url TEXT NOT NULL,
user_id BIGINT, -- nullable for anonymous
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP, -- NULL = never expires
click_count BIGINT DEFAULT 0
);
CREATE INDEX idx_short_code ON urls(short_code); -- primary lookup path
CREATE TABLE clicks (
id BIGSERIAL PRIMARY KEY,
short_code VARCHAR(16) NOT NULL,
clicked_at TIMESTAMP DEFAULT NOW(),
ip_hash VARCHAR(64),
country VARCHAR(2),
user_agent TEXT
);Scaling Considerations
- Read replicas: 99% of traffic is reads (redirects). Route all reads to replicas, writes to the primary.
- Redis cluster: cache hot short codes in Redis with a TTL equal to the URL expiry. Eviction policy: `allkeys-lru`.
- CDN edge caching: for non-expiring URLs, cache the 301 redirect at the CDN edge — eliminates origin requests entirely.
- Sharding: shard the `urls` table by `short_code` hash for horizontal scale. Consistent hashing avoids rebalancing pain.
- Analytics pipeline: write click events to Kafka asynchronously; Flink or Spark aggregates into a data warehouse. Never write analytics in the hot redirect path.
- Rate limiting: apply per-user and per-IP limits at the API gateway to prevent abuse.
Common Pitfall: Bloom Filters
Some candidates suggest using a Bloom filter to check for existing short codes before DB lookup. While creative, it adds complexity without much gain if Redis already caches all active URLs. Mention it as an optimization you'd consider at very large scale, but don't lead with it.
Interview Tip
Interviewers look for three things in this problem: (1) a collision-free ID strategy you can defend, (2) a clear explanation of your caching hierarchy and why each layer exists, and (3) awareness of the 301 vs 302 trade-off. Mentioning analytics as an async concern (not in the hot path) signals production maturity.
Practice this pattern
Design a URL shortener like bit.ly