Data Storage

Everything about persisting data: relational databases, NoSQL variants, sharding, replication, and choosing the right storage for your workload.

8 lessons~110 min

What you'll learn

ACID properties, normalization, indexing strategies (B-tree, hash), query optimization, and when relational databases are the right choice.

Four NoSQL families: document stores (MongoDB), key-value (Redis), wide-column (Cassandra), and graph databases (Neo4j). When to use each.

Horizontal vs vertical partitioning, shard key selection, range vs hash partitioning, hot spots, and rebalancing strategies.

Master-slave, master-master, and quorum-based replication. Replication lag, conflict resolution, and trade-offs between consistency and availability.

A practical decision framework for choosing between SQL and NoSQL: data model, query patterns, consistency needs, and scaling requirements.

Purpose-built databases for time-stamped data: InfluxDB, TimescaleDB. Write optimization, retention policies, and downsampling.

Structured vs unstructured data storage: data warehouses (Redshift, BigQuery) vs data lakes (S3 + Spark). ETL, ELT, and the lakehouse pattern.

How object storage works: S3-compatible APIs, eventual consistency, versioning, lifecycle policies, and when to use it vs block/file storage.