CAP Theorem & PACELC
Deep dive into the CAP theorem, its practical implications, and the PACELC extension that covers latency vs consistency trade-offs.
The CAP Theorem
Proposed by Eric Brewer in 2000 and formally proven by Gilbert and Lynch in 2002, the CAP theorem states that a distributed data store can provide at most two of the following three guarantees simultaneously:
- Consistency (C) — Every read receives the most recent write or an error. All nodes see the same data at the same time.
- Availability (A) — Every request receives a (non-error) response, without guaranteeing that it contains the most recent data.
- Partition Tolerance (P) — The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network between nodes.
In practice, partition tolerance is non-negotiable. Network partitions happen — cables fail, switches misbehave, data centers lose connectivity. A distributed system that stops working during a partition is not useful. Therefore, the real design choice in distributed systems is between Consistency and Availability during a partition.
CP Systems: Consistency Over Availability
CP systems sacrifice availability during a partition. If nodes cannot coordinate, they refuse requests rather than risk serving stale data. This is appropriate for applications where correctness is non-negotiable: financial ledgers, inventory management, distributed lock managers.
ZooKeeper is a canonical CP system. It uses the ZAB (ZooKeeper Atomic Broadcast) consensus protocol to ensure all reads reflect the latest committed write. During a partition that isolates a minority of nodes, those nodes stop serving reads and writes. This makes ZooKeeper suitable for leader election and distributed configuration, but not for user-facing services that must stay available.
AP Systems: Availability Over Consistency
AP systems continue to serve requests during a partition, but nodes may return stale data. The system accepts writes on all sides of a partition and reconciles conflicts when the partition heals. This is appropriate for applications where availability matters more than perfect consistency: social feeds, product catalogs, user preference settings.
Apache Cassandra, designed by Facebook for the Inbox Search feature, is a prototypical AP system. During a partition, Cassandra nodes on each side of the partition continue to accept writes. When the partition heals, Cassandra uses last-write-wins conflict resolution (based on timestamps). Amazon's DynamoDB follows the same philosophy — default reads are eventually consistent, trading strict correctness for availability and low latency.
Practical CAP Analysis
| Use Case | CAP Choice | Reasoning |
|---|---|---|
| Bank account balance | CP | Incorrect balance is worse than temporary unavailability |
| Social media like count | AP | Slight staleness is fine; refusing requests is not |
| Shopping cart contents | AP | Better to show a slightly stale cart than to fail checkout |
| Inventory reservation | CP | Overselling is a business problem; must coordinate |
| User profile preferences | AP | Showing yesterday's settings is acceptable |
| Distributed lock / leader election | CP | Two leaders is catastrophic; prefer unavailability |
Interview Tip
When discussing CAP in an interview, avoid getting stuck on the theory. Frame it practically: 'For the [feature], correctness is more important than availability because [business reason], so I'd use a CP database like [X]. For [other feature], availability is more important because [reason], so I'd use an AP store like [Y] and design around eventual consistency.' This shows you understand when theory translates into real decisions.
The PACELC Extension
A criticism of CAP is that it only describes behavior during partitions — a relatively rare event. PACELC (proposed by Daniel Abadi in 2010) extends CAP to also characterize behavior during normal operation:
PACELC formula
If there is a Partition (P), how does the system trade off Availability (A) vs Consistency (C)? Else (E), when the network is healthy, how does it trade off Latency (L) vs Consistency (C)?
The insight is that even without partitions, replication introduces a latency vs consistency trade-off. To achieve strong consistency, a write must be confirmed by multiple replicas before returning to the client — this takes time. To achieve low latency, you can return immediately after writing to one replica, at the risk of returning stale data on subsequent reads.
| System | Partition behavior | Else behavior | PACELC label |
|---|---|---|---|
| Cassandra | Favors Availability | Favors Latency | PA/EL |
| CRDB (CockroachDB) | Favors Consistency | Favors Consistency | PC/EC |
| DynamoDB | Favors Availability | Favors Latency (default) | PA/EL |
| Spanner | Favors Consistency | Favors Consistency | PC/EC |
| MongoDB (default) | Favors Consistency | Favors Consistency | PC/EC |
Google Spanner: Defying CAP?
Google's Spanner is often described as 'effectively CA' — externally consistent, globally distributed, and highly available. This sounds like it violates CAP. The key is Spanner's use of TrueTime: GPS and atomic clocks in every data center allow Spanner to timestamp transactions with bounded uncertainty (±7ms). By waiting out this uncertainty window before committing, Spanner achieves external consistency without classic two-phase commit across data centers. It does sacrifice some latency (the TrueTime wait), but this aligns with the PACELC model: it is PC/EC — consistent during partitions (by refusing) and consistent else (by paying latency).
CAP in the real world
Most modern databases let you tune the trade-off. Cassandra's consistency levels (ONE, QUORUM, ALL) let you choose more consistency at the cost of higher latency and reduced availability, on a per-query basis. DynamoDB offers strongly consistent reads (at 2x the cost) alongside eventually consistent reads. The binary CP/AP classification is a simplification — real systems operate on a spectrum.