Course/Deployment & Operations Patterns/Rolling Update

Rolling Update

Incrementally replace instances with new versions: max surge, max unavailable, health check integration, and handling backward-incompatible changes.

8 min read

What Is a Rolling Update?

A rolling update incrementally replaces instances of the old version with the new version, one batch at a time. At no point during the deployment is the service completely down — there are always some instances of the old or new version serving traffic. This gives you zero downtime without the full infrastructure cost of blue-green.

Rolling updates are the default deployment strategy in Kubernetes (`RollingUpdate` is the default `Deployment` strategy). They are also native to AWS Auto Scaling Groups, Elastic Beanstalk, and most PaaS platforms. Understanding rolling updates deeply — including their limitations — is essential for any backend or platform interview.

Loading diagram...

Rolling update with batch size 2: old pods are replaced incrementally, health checks gate each batch.

Key Parameters: maxSurge and maxUnavailable

Kubernetes rolling update strategy is controlled by two parameters:

Parameter	Definition	Default	Effect when increased
`maxUnavailable`	Max number of pods that can be unavailable during the update	25%	Faster rollout, more capacity reduction
`maxSurge`	Max number of pods that can exist above the desired count	25%	Faster rollout, higher temporary cost

yaml

# Kubernetes Deployment rolling update configuration
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2        # Allow up to 12 pods total (10 + 2)
      maxUnavailable: 0  # Never reduce below 10 pods
  template:
    spec:
      containers:
        - name: payment-service
          image: payment-service:v2.0
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
            failureThreshold: 3

Setting `maxUnavailable: 0` and `maxSurge: N` is the safest configuration for production: it never takes capacity below the desired count. New pods are started (up to the surge limit) and only when they pass readiness checks are old pods terminated. This prevents any service degradation during the rollout.

Health Checks Are Critical

A rolling update without properly configured health checks is dangerous. Kubernetes uses two types of probes:

Readiness probe: Gates when a pod starts receiving traffic. If the readiness probe fails, Kubernetes removes the pod from the service endpoint (stops sending it traffic) but does NOT restart it. New pods in a rolling update must pass readiness before old pods are terminated.
Liveness probe: Detects when a pod is unhealthy and needs to be restarted. A failing liveness probe causes Kubernetes to kill and restart the pod. Do not set these too aggressively or you'll cause restart loops during slow startup.

⚠️

The Readiness Probe Timing Gap

There is a subtle race condition in Kubernetes: when a pod passes its readiness probe and is added to the endpoint, there is a brief window before all kube-proxies update their iptables rules. Set `terminationGracePeriodSeconds` to at least 30 seconds on old pods so in-flight connections drain gracefully before the pod dies.

The Backward Compatibility Problem

During a rolling update, both versions are serving traffic simultaneously. This is the fundamental constraint that makes rolling updates more complex than blue-green. You must ensure:

API backward compatibility: If v2.0 changes an API response format, clients consuming it may receive either the old or new format during the rollout window. Design APIs to be forward-compatible (add fields, never rename or remove).
Message format compatibility: If services communicate via a queue or event stream, v1.0 consumers may receive messages produced by v2.0 producers (and vice versa). Use schema evolution patterns (Avro, Protobuf with backward-compatible changes).
Database compatibility: Any schema change applied during a rolling update must be compatible with both v1.0 and v2.0 simultaneously. Use expand-contract here as well.

Rollback

Rolling updates support rollback, but rollback is another rolling update in reverse — it is not instant. If you've already rolled out 80% of pods to v2.0, a rollback will take similar time to roll them back to v1.0. For truly instant rollback, you need blue-green. Use `kubectl rollout undo deployment/my-app` to trigger a rollback.

Strategy	Rollback Speed	Cost Overhead	Mixed-Version Window
Rolling Update	Slow (proportional to instance count)	None — no extra infra	Yes — both versions live simultaneously
Blue-Green	Instant (seconds)	2x infrastructure cost	No — clean cut
Canary	Fast (scale canary to 0%)	Small (canary % of infra)	Yes — but controlled %

💡

Interview Tip

Rolling updates are the first deployment strategy most engineers encounter (it's Kubernetes default). In interviews, show sophistication by immediately raising the mixed-version window problem. Say: 'During a rolling update, both old and new versions are live simultaneously. This means any API contract or schema change must be backward compatible — or we need a two-phase migration. This is the key constraint that makes rolling updates harder than they appear at first glance.'

Feature Flags

Immutable Infrastructure