Module 3
Everything about persisting data: relational databases, NoSQL variants, sharding, replication, and choosing the right storage for your workload.
ACID properties, normalization, indexing strategies (B-tree, hash), query optimization, and when relational databases are the right choice.
Four NoSQL families: document stores (MongoDB), key-value (Redis), wide-column (Cassandra), and graph databases (Neo4j). When to use each.
Horizontal vs vertical partitioning, shard key selection, range vs hash partitioning, hot spots, and rebalancing strategies.
Master-slave, master-master, and quorum-based replication. Replication lag, conflict resolution, and trade-offs between consistency and availability.
A practical decision framework for choosing between SQL and NoSQL: data model, query patterns, consistency needs, and scaling requirements.
Purpose-built databases for time-stamped data: InfluxDB, TimescaleDB. Write optimization, retention policies, and downsampling.
Structured vs unstructured data storage: data warehouses (Redshift, BigQuery) vs data lakes (S3 + Spark). ETL, ELT, and the lakehouse pattern.
How object storage works: S3-compatible APIs, eventual consistency, versioning, lifecycle policies, and when to use it vs block/file storage.