Rolling Update
Incrementally replace instances with new versions: max surge, max unavailable, health check integration, and handling backward-incompatible changes.
What Is a Rolling Update?
A rolling update incrementally replaces instances of the old version with the new version, one batch at a time. At no point during the deployment is the service completely down — there are always some instances of the old or new version serving traffic. This gives you zero downtime without the full infrastructure cost of blue-green.
Rolling updates are the default deployment strategy in Kubernetes (`RollingUpdate` is the default `Deployment` strategy). They are also native to AWS Auto Scaling Groups, Elastic Beanstalk, and most PaaS platforms. Understanding rolling updates deeply — including their limitations — is essential for any backend or platform interview.
Key Parameters: maxSurge and maxUnavailable
Kubernetes rolling update strategy is controlled by two parameters:
| Parameter | Definition | Default | Effect when increased |
|---|---|---|---|
| `maxUnavailable` | Max number of pods that can be unavailable during the update | 25% | Faster rollout, more capacity reduction |
| `maxSurge` | Max number of pods that can exist above the desired count | 25% | Faster rollout, higher temporary cost |
# Kubernetes Deployment rolling update configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
spec:
replicas: 10
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2 # Allow up to 12 pods total (10 + 2)
maxUnavailable: 0 # Never reduce below 10 pods
template:
spec:
containers:
- name: payment-service
image: payment-service:v2.0
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3Setting `maxUnavailable: 0` and `maxSurge: N` is the safest configuration for production: it never takes capacity below the desired count. New pods are started (up to the surge limit) and only when they pass readiness checks are old pods terminated. This prevents any service degradation during the rollout.
Health Checks Are Critical
A rolling update without properly configured health checks is dangerous. Kubernetes uses two types of probes:
- Readiness probe: Gates when a pod starts receiving traffic. If the readiness probe fails, Kubernetes removes the pod from the service endpoint (stops sending it traffic) but does NOT restart it. New pods in a rolling update must pass readiness before old pods are terminated.
- Liveness probe: Detects when a pod is unhealthy and needs to be restarted. A failing liveness probe causes Kubernetes to kill and restart the pod. Do not set these too aggressively or you'll cause restart loops during slow startup.
The Readiness Probe Timing Gap
There is a subtle race condition in Kubernetes: when a pod passes its readiness probe and is added to the endpoint, there is a brief window before all kube-proxies update their iptables rules. Set `terminationGracePeriodSeconds` to at least 30 seconds on old pods so in-flight connections drain gracefully before the pod dies.
The Backward Compatibility Problem
During a rolling update, both versions are serving traffic simultaneously. This is the fundamental constraint that makes rolling updates more complex than blue-green. You must ensure:
- API backward compatibility: If v2.0 changes an API response format, clients consuming it may receive either the old or new format during the rollout window. Design APIs to be forward-compatible (add fields, never rename or remove).
- Message format compatibility: If services communicate via a queue or event stream, v1.0 consumers may receive messages produced by v2.0 producers (and vice versa). Use schema evolution patterns (Avro, Protobuf with backward-compatible changes).
- Database compatibility: Any schema change applied during a rolling update must be compatible with both v1.0 and v2.0 simultaneously. Use expand-contract here as well.
Rollback
Rolling updates support rollback, but rollback is another rolling update in reverse — it is not instant. If you've already rolled out 80% of pods to v2.0, a rollback will take similar time to roll them back to v1.0. For truly instant rollback, you need blue-green. Use `kubectl rollout undo deployment/my-app` to trigger a rollback.
| Strategy | Rollback Speed | Cost Overhead | Mixed-Version Window |
|---|---|---|---|
| Rolling Update | Slow (proportional to instance count) | None — no extra infra | Yes — both versions live simultaneously |
| Blue-Green | Instant (seconds) | 2x infrastructure cost | No — clean cut |
| Canary | Fast (scale canary to 0%) | Small (canary % of infra) | Yes — but controlled % |
Interview Tip
Rolling updates are the first deployment strategy most engineers encounter (it's Kubernetes default). In interviews, show sophistication by immediately raising the mixed-version window problem. Say: 'During a rolling update, both old and new versions are live simultaneously. This means any API contract or schema change must be backward compatible — or we need a two-phase migration. This is the key constraint that makes rolling updates harder than they appear at first glance.'