Course/Infrastructure & DevOps/Containerization with Docker

Containerization with Docker

How containers work: images, layers, registries, networking, storage, resource limits, and the difference between containers and VMs.

15 min readHigh interview weight

Containers vs Virtual Machines

A virtual machine (VM) bundles an entire OS kernel, system libraries, and your application. A container shares the host OS kernel and isolates only the application and its dependencies using two Linux primitives: namespaces (process, network, filesystem, IPC isolation) and cgroups (CPU, memory, I/O limits). The result: containers start in milliseconds instead of minutes and consume megabytes of RAM rather than gigabytes.

Property	Virtual Machine	Container
Boot time	30–120 seconds	< 1 second
Size	Gigabytes (full OS)	Megabytes (app + libs)
OS kernel	Each VM has its own	Shared with host
Isolation	Strong (hypervisor)	Process-level (namespace)
Portability	Medium (hypervisor-specific)	High (any Docker host)
Density	Tens per host	Hundreds per host

ℹ️

When VMs still win

Use VMs when you need strong multi-tenant isolation (e.g., running untrusted customer code), OS-level customization, or Windows workloads on a Linux host. Containers are not a security boundary as strong as a hypervisor.

Docker Image Layers

Every `Dockerfile` instruction (`FROM`, `RUN`, `COPY`, `ADD`) creates an immutable layer. Docker stacks these layers using a Union File System (typically `overlayfs`). Layers are content-addressed by SHA256 hash and cached — if you change a late instruction, Docker only rebuilds from that point down. This makes iterative builds fast.

Loading diagram...

Docker image layer stack. The container adds a thin writable layer on top of read-only image layers.

Writing Efficient Dockerfiles

Layer ordering matters. Put instructions that change least often (OS packages, dependency installation) early and app code late so cache invalidation is minimal. Use `.dockerignore` to exclude `node_modules`, `.git`, and build artifacts from the build context.

dockerfile

# ── Bad: invalidates dependency cache on every code change ──
FROM node:20-alpine
COPY . .
RUN npm ci
CMD ["node", "src/index.js"]

# ── Good: cache npm install separately from app code ──
FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

FROM node:20-alpine AS runner
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY src/ ./src/
EXPOSE 3000
USER node
CMD ["node", "src/index.js"]

The multi-stage build above separates the dependency installation stage from the final runtime image. The `--from=deps` flag copies only `node_modules`, keeping the runner stage clean and small. Running as `USER node` (non-root) is a security best practice.

Container Networking

Docker creates a virtual bridge network (`docker0`) by default. Each container gets its own network namespace with a virtual ethernet pair. Docker's embedded DNS resolver lets containers address each other by service name in custom bridge networks and in Docker Compose.

Network Mode	Use Case	Isolation
bridge (default)	Multi-container on same host	Container-level
host	Performance-critical, low latency	None (shares host network)
overlay	Multi-host (Docker Swarm)	Cross-host
none	No network access needed	Full
macvlan	Container needs its own MAC/IP	Appears as physical device

Storage: Volumes vs Bind Mounts

Volumes are managed by Docker (`/var/lib/docker/volumes/`) and are the recommended way to persist data. They survive container restarts and can be shared between containers. Bind mounts map a host path directly into a container — useful for development (hot reload) but fragile in production. tmpfs mounts store data in host memory only, useful for sensitive temporary data.

Resource Limits with cgroups

Without limits, a runaway container can starve the host. Docker exposes cgroup controls via flags on `docker run`:

bash

# Limit to 512 MB RAM and 1.5 CPU cores
docker run --memory=512m --cpus=1.5 my-service

# Kubernetes equivalent in a Pod spec
resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "1500m"

⚠️

OOMKilled: the silent failure

If a container exceeds its memory limit, the kernel OOM killer terminates it — often with no application-level log. Always set memory limits in production and monitor for `OOMKilled` exit codes in your orchestrator.

Container Registries

A registry stores and distributes Docker images. Docker Hub is the public default. Production teams run private registries — AWS ECR, Google Artifact Registry, GitHub Container Registry — to avoid pull-rate limits, improve latency, and control access. Images are referenced as `registry/namespace/name:tag` (e.g., `gcr.io/my-project/api:v1.2.3`).

💡

Interview Tip

When an interviewer asks 'how would you containerize this service?' walk through: (1) base image choice (distroless or alpine for small attack surface), (2) multi-stage build to separate build and runtime, (3) layer ordering for cache efficiency, (4) non-root user, (5) resource limits in the orchestrator. This signals production-readiness awareness.

Bloom Filters & HyperLogLog

Kubernetes Orchestration