Time-Series Databases
Purpose-built databases for time-stamped data: InfluxDB, TimescaleDB. Write optimization, retention policies, and downsampling.
What Is Time-Series Data?
Time-series data is a sequence of measurements indexed by time. Every data point has a timestamp, a metric name, and a value — plus optional tags (metadata). Examples include server CPU usage every 10 seconds, stock prices every millisecond, IoT sensor readings every minute, and application request latency tracked over time. The defining characteristic: writes always append new timestamps; old data is rarely updated or deleted.
Why Not Use a General-Purpose Database?
You *can* store time-series data in PostgreSQL, but at scale it becomes painful. Consider a monitoring system writing 1 million metrics per second. In PostgreSQL: B-tree index maintenance on a rapidly growing table becomes expensive, range scans over time windows degrade, and storing billions of rows requires constant maintenance (VACUUM, index bloat). Time-series databases address these problems with purpose-built storage engines.
Key Time-Series Databases
| Database | Model | Best For | Notes |
|---|---|---|---|
| InfluxDB | Custom TSM engine | DevOps metrics, IoT, real-time analytics | Flux query language; native retention policies |
| TimescaleDB | PostgreSQL extension (hypertables) | SQL familiarity, existing PG stack | Full SQL, JOINs to relational data, automatic partitioning |
| Prometheus | Pull-based, in-memory TSDB | Kubernetes / infrastructure monitoring | Short retention; pair with Thanos/Cortex for long-term storage |
| Apache Druid | Columnar, pre-aggregated | Sub-second analytics on massive event streams | Used by Lyft, Airbnb for user-facing analytics |
| Amazon Timestream | Serverless AWS TSDB | AWS-native IoT/ops monitoring | Automatic tiering from memory → SSD → S3 |
Internal Architecture: Why TSDBs Are Fast
Time-series databases are optimized for sequential writes (always appending to the latest time window) and time-range reads (give me all CPU readings for the last hour). They achieve this through:
- Time-partitioned storage: Data is stored in time-bucketed chunks (e.g., 1-hour or 1-day chunks). Queries over a time range only touch the relevant chunks.
- Columnar compression: Consecutive timestamps differ by small deltas — highly compressible with delta encoding. Values (floats) compress well with XOR encoding (Gorilla algorithm, used by Prometheus).
- Automatic chunk expiration: Old chunks can be dropped atomically (one file delete) instead of row-by-row deletes, which would cause index fragmentation.
Retention Policies and Downsampling
Raw time-series data grows unboundedly. A monitoring system collecting 1M metrics/second generates 86 billion data points per day. Retention policies automatically expire data older than N days. Downsampling (or rollup) aggregates fine-grained data into coarser granularity before expiring the raw data — keeping 30 days of raw 10-second data, 1 year of 1-minute averages, and 5 years of 1-hour averages.
-- TimescaleDB: create a hypertable partitioned by time
SELECT create_hypertable('cpu_metrics', 'time');
-- Continuous aggregate: pre-compute 1-minute averages
CREATE MATERIALIZED VIEW cpu_1min
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 minute', time) AS bucket,
host,
AVG(cpu_percent) AS avg_cpu,
MAX(cpu_percent) AS max_cpu
FROM cpu_metrics
GROUP BY bucket, host;
-- Retention policy: drop raw data older than 30 days
SELECT add_retention_policy('cpu_metrics', INTERVAL '30 days');TimescaleDB for SQL Teams
If your team already uses PostgreSQL, TimescaleDB is the lowest-friction time-series solution. It's a PostgreSQL extension — same connection strings, same SQL, same tooling. Hypertables partition automatically by time, and continuous aggregates pre-compute rollups. You get 10-100x better time-series performance with zero new infrastructure.
Interview Tip
Time-series databases rarely appear as the central topic in system design interviews, but they show up as a component. If asked to design a metrics platform, monitoring system, or IoT data pipeline, mention: 'For the time-series data, I'd use InfluxDB or TimescaleDB with a 30-day raw retention policy and continuous aggregates for historical trends.' This shows architectural awareness without getting lost in a low-weight topic.