Distributed Transactions: 2PC & 3PC
Coordinating transactions across services: two-phase commit, three-phase commit, blocking problems, and why modern systems often avoid them.
Why Distributed Transactions Are Hard
A distributed transaction spans multiple databases, services, or nodes. The fundamental challenge: each participant must either all commit or all abort, yet any participant (or the network) can fail at any point. Without a coordination protocol, you risk one service committing while another aborts — leaving data permanently inconsistent.
ACID across services
ACID guarantees (Atomicity, Consistency, Isolation, Durability) are straightforward within a single database engine. Achieving them across multiple independent services requires explicit coordination protocols, and those protocols come with significant performance and availability costs.
Two-Phase Commit (2PC)
Two-Phase Commit is the classic distributed transaction protocol. A coordinator node orchestrates two phases: in Phase 1 (Prepare), the coordinator asks all participants whether they can commit. In Phase 2 (Commit/Abort), based on all responses, the coordinator sends the final decision.
The Blocking Problem
2PC is a blocking protocol. If the coordinator crashes after sending PREPARE but before sending the commit decision, participants are stuck holding locks indefinitely. They cannot unilaterally decide to commit or abort — aborting might violate atomicity if other participants already committed. This is called the in-doubt window: participants must wait for the coordinator to recover.
2PC failure scenarios
Coordinator failure after prepare: all participants block until coordinator recovers. Participant failure after voting YES: coordinator must wait or abort. Network partition: participants on the wrong side of the partition cannot make progress. In all these cases, locks are held, reducing throughput significantly.
Three-Phase Commit (3PC)
Three-Phase Commit adds a `pre-commit` phase to eliminate the blocking problem in 2PC. The three phases are: CanCommit (equivalent to Prepare), PreCommit (coordinator and participants acknowledge readiness), and DoCommit (final commit). The key insight is that after all participants acknowledge PreCommit, a participant that loses contact with the coordinator can safely commit on its own — it knows all others are also in the PreCommit state.
| Property | 2PC | 3PC |
|---|---|---|
| Phases | Prepare, Commit/Abort | CanCommit, PreCommit, DoCommit |
| Blocking on coordinator failure | Yes — indefinite | No — participants can proceed |
| Message complexity | 2N messages (prepare + commit) | 3N messages |
| Network partition safe | No | No (can cause split-brain) |
| Latency | Lower (fewer messages) | Higher (extra round trip) |
| Used in practice | PostgreSQL FDW, XA transactions | Rare — mostly theoretical |
3PC is not partition-tolerant
Despite solving the blocking problem, 3PC can cause split-brain under network partitions. If the partition occurs between PreCommit and DoCommit, the two partitions can independently decide differently (one aborts due to timeout, the other commits). For this reason, 3PC is rarely used in real distributed systems.
Modern Alternatives
Most modern microservice architectures avoid distributed transactions entirely. The alternatives include:
- Saga pattern: A sequence of local transactions, each publishing events that trigger the next. Compensating transactions handle rollback. Used by Uber, Amazon, and most event-driven architectures.
- Outbox pattern: Write to local DB and an outbox table atomically, then a relay process publishes events. Guarantees at-least-once delivery without distributed transactions.
- Eventual consistency: Accept that different services will be temporarily inconsistent, and design the UX to handle it (e.g., show 'pending' state).
- Google Spanner: Uses TrueTime-based timestamps to achieve external consistency with Paxos, avoiding traditional 2PC coordinator bottlenecks at scale.
Interview Tip
Interviewers often ask 'how do you handle a transaction that spans two microservices?' The correct answer is NOT '2PC'. Explain that 2PC introduces tight coupling and a blocking coordinator. Instead, describe the Saga pattern with compensating transactions, and mention the Outbox pattern for reliable event publishing. This shows you understand the operational realities of distributed systems.