Menu
Course/Infrastructure & DevOps/Service Mesh (Istio & Envoy)

Service Mesh (Istio & Envoy)

Transparent infrastructure for microservices: data plane (Envoy sidecars), control plane (Istio), traffic management, mutual TLS, and observability.

15 min read

The Microservices Network Problem

In a microservices architecture with dozens of services, every service needs to implement the same boilerplate: retries on transient failures, timeouts, circuit breaking to prevent cascade failures, mutual TLS for service authentication, and emitting traces. Doing this in every service — in multiple languages — is expensive and inconsistent. A service mesh extracts this cross-cutting network logic into the infrastructure layer.

Architecture: Data Plane and Control Plane

Loading diagram...
Istio service mesh: Envoy sidecars form the data plane; Istiod is the control plane.

The data plane consists of Envoy proxy instances running as sidecars in every application Pod. Envoy intercepts all inbound and outbound traffic via iptables rules — the application sends to `localhost` and Envoy handles everything else. The control plane (`Istiod`) pushes configuration to all Envoy instances using the xDS protocol, distributes mTLS certificates, and aggregates telemetry.

Envoy Proxy Capabilities

  • Retries with backoff — automatically retry failed requests with configurable limits and jitter
  • Timeouts — per-route and per-cluster deadline enforcement
  • Circuit breaking — eject unhealthy upstream hosts when error rate exceeds threshold
  • Load balancing — round-robin, least-request, random, consistent hashing
  • Outlier detection — remove consistently slow or failing hosts from the load-balancing pool
  • Rate limiting — local or global rate limiting via the rate limit service
  • Distributed tracing — propagates B3 / W3C trace context headers, reports spans to Jaeger/Zipkin

Istio Traffic Management

Istio's `VirtualService` and `DestinationRule` resources give fine-grained traffic control without changing Kubernetes Service or application code.

yaml
# Canary: send 10% of traffic to v2, 90% to v1
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
    - reviews
  http:
    - route:
        - destination:
            host: reviews
            subset: v1
          weight: 90
        - destination:
            host: reviews
            subset: v2
          weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews
spec:
  host: reviews
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 10s
      baseEjectionTime: 30s
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2

Mutual TLS (mTLS)

Istiod acts as an internal Certificate Authority (CA). On Pod startup, it issues a short-lived X.509 certificate containing the service's SPIFFE identity (e.g., `spiffe://cluster.local/ns/default/sa/reviews`). Envoy uses this certificate to authenticate and encrypt all inter-service connections. STRICT mTLS mode rejects any plaintext traffic; PERMISSIVE mode allows both (useful during migration).

💡

mTLS enables zero-trust networking

With Istio mTLS in STRICT mode, every service-to-service call is authenticated. You can write AuthorizationPolicy rules that deny all traffic except explicitly allowed service pairs — implementing zero-trust at the network level without any application code.

Service Mesh Trade-offs

BenefitCost
Zero-code retries, timeouts, circuit breakingAdded latency (~1–3ms per hop)
Automatic mTLS between all servicesMore CPU and memory per pod
Unified observability (traces, metrics)Complex control plane to operate
Fine-grained traffic control (canary, A/B)Steep learning curve for operators
💡

Interview Tip

If asked 'how would you implement circuit breaking between microservices?' — mention that in a service mesh like Istio, you configure `outlierDetection` in a `DestinationRule` without any code changes. Without a mesh, you'd use a library like Resilience4j (Java) or Polly (.NET) inside the application. Know both approaches and when each is appropriate.

📝

Knowledge Check

5 questions

Test your understanding of this lesson. Score 70% or higher to complete.

Ask about this lesson

Ask anything about Service Mesh (Istio & Envoy)