Health Endpoint Monitoring
Expose health checks for load balancers and orchestrators: liveness vs readiness probes, deep health checks, and dependency monitoring.
What Is a Health Endpoint?
A health endpoint is a dedicated HTTP endpoint (typically `GET /health` or `GET /healthz`) that reports whether a service instance is operating correctly. Load balancers, Kubernetes, and monitoring systems poll this endpoint to decide whether to route traffic to the instance, restart it, or alert on-call engineers.
A well-designed health endpoint checks the service's actual ability to serve requests — not just that the process is running — by verifying connectivity to databases, caches, message brokers, and other critical dependencies.
Liveness vs Readiness Probes
Kubernetes distinguishes two health probe types with different semantics and consequences:
| Probe | Question It Answers | Failure Action | Checks |
|---|---|---|---|
| Liveness | Is this container alive (not stuck/deadlocked)? | Kill and restart the container | Process responsive, no deadlock. Avoid heavy dependency checks — a DB outage should NOT restart the container |
| Readiness | Is this container ready to serve traffic? | Remove from load balancer pool (don't restart) | Database reachable, migrations complete, warm-up done, dependency health acceptable |
| Startup (optional) | Has the container finished initializing? | Kill if startup takes too long | One-time check; prevents liveness from killing slow-starting containers |
Don't Check External Dependencies in Liveness Probes
If your liveness probe checks the database and the database goes down, Kubernetes will restart every pod — causing a thundering-herd restart storm that makes recovery much harder. Liveness probes should only check that the application process itself is responsive (e.g., can it accept HTTP connections and respond to a simple endpoint).
Anatomy of a Deep Health Check
A deep health check returns the status of each critical dependency individually, enabling operators to quickly identify which component is failing. The response should include overall status plus per-component status.
// GET /health/ready response
{
"status": "degraded",
"version": "2.3.1",
"uptime": 3642,
"checks": {
"database": {
"status": "healthy",
"latency_ms": 4
},
"redis": {
"status": "healthy",
"latency_ms": 1
},
"payment_api": {
"status": "unhealthy",
"error": "Connection refused",
"latency_ms": null
},
"disk_space": {
"status": "healthy",
"free_gb": 45.2
}
}
}The overall status is `degraded` (not fully `healthy`) because the payment API is unreachable. The service might still be able to serve read requests, so it returns a non-fatal status rather than marking itself unhealthy entirely. Define per-component severity: critical dependencies (DB) failing = `unhealthy`; optional dependencies failing = `degraded`.
Health Check Implementation
import express from "express";
import { db } from "./db";
import { redis } from "./cache";
const app = express();
// Liveness: is the process alive?
app.get("/health/live", (req, res) => {
res.json({ status: "ok" });
});
// Readiness: can this instance serve traffic?
app.get("/health/ready", async (req, res) => {
const checks: Record<string, object> = {};
let overall: "healthy" | "degraded" | "unhealthy" = "healthy";
// Database check
try {
const start = Date.now();
await db.query("SELECT 1");
checks.database = { status: "healthy", latency_ms: Date.now() - start };
} catch (err) {
checks.database = { status: "unhealthy", error: String(err) };
overall = "unhealthy"; // critical dependency
}
// Redis check
try {
const start = Date.now();
await redis.ping();
checks.redis = { status: "healthy", latency_ms: Date.now() - start };
} catch (err) {
checks.redis = { status: "degraded", error: String(err) };
if (overall === "healthy") overall = "degraded"; // non-critical
}
const statusCode = overall === "unhealthy" ? 503 : 200;
res.status(statusCode).json({ status: overall, checks });
});Health Checks in Load Balancers
AWS ALB and NLB poll a configured health check endpoint every N seconds. An instance is considered healthy after passing M consecutive checks and unhealthy after failing N consecutive checks (hysteresis prevents flapping). Unhealthy instances are removed from the target group and no traffic is sent to them. The health check path, port, protocol, interval (default 30 s), timeout, and thresholds are all configurable.
Health Check Standards
The Health Check API pattern (from Microsoft Azure patterns) and the IETF draft for `application/health+json` (draft-inadarei-api-health-check) both define standard response formats. Spring Boot Actuator (`/actuator/health`) and Node.js `@godaddy/terminus` provide batteries-included health check frameworks that integrate with Kubernetes probes out of the box.
Interview Tip
Health endpoints come up when discussing operational excellence and Kubernetes deployments. Key distinctions to make: liveness (restart the pod) vs readiness (remove from LB pool), why you must never check external dependencies in liveness probes, and how health checks enable zero-downtime deployments (readiness probe prevents traffic until the new pod is fully initialized). Mention the 'thundering herd restart' anti-pattern as a concrete failure mode.