Mental Model: System design is about making intentional decisions under constraints. Every choice is a tradeoff — consistency vs availability, latency vs throughput, simplicity vs scalability. The goal is not the "perfect" architecture but the right architecture for your scale, team size, and problem. Start simple. Add complexity only when you hit a real bottleneck.
| Topic | Core Concepts & Mental Model | Key Techniques | Tradeoffs & Failure Modes | Resources |
|---|---|---|---|---|
| CAP Theorem | In a distributed system, during a network partition you must choose between Consistency (all nodes return the same data) or Availability (every request gets a response). Partition tolerance is not optional in real networks — you always have it. The real choice is CP vs AP | CP systems (PostgreSQL, HBase, Zookeeper): reject writes during partition to stay consistent. AP systems (Cassandra, DynamoDB, CouchDB): accept writes during partition and sync later (eventual consistency) | ❌ Most engineers misapply CAP. "CA" systems don't exist in distributed deployments — if there's a network, there are partitions. PACELC is a better model: even without partitions, you still trade latency (L) vs consistency (C) | Jepsen DB Tests, PACELC Paper |
| ACID vs BASE | ACID (Relational DBs): Atomicity (all-or-nothing), Consistency (valid state after tx), Isolation (tx don't interfere), Durability (committed data survives crash). BASE (NoSQL): Basically Available, Soft state, Eventually consistent — trades guarantees for scale | ACID: PostgreSQL, MySQL, SQLite — strong guarantees, best for financial/transactional data. BASE: Cassandra, DynamoDB — high write throughput, global distribution, tolerates inconsistency | ❌ "NoSQL scales, SQL doesn't" is a myth. PostgreSQL handles millions of requests with proper indexing and connection pooling. Choose by consistency requirement, not hype | PostgreSQL ACID |
| Consistency Models | A spectrum from strong to weak. Stronger = safer but slower. Weaker = faster but complex application logic | Strong: every read returns latest write (expensive — requires coordination). Read-your-writes: you always see your own updates (session-level). Monotonic read: never read older data than you've already seen. Eventual: all nodes converge eventually (cheapest) | ❌ Eventual consistency is often misunderstood — it doesn't mean "usually consistent." It means "will be consistent eventually if no new updates." Window of inconsistency can be milliseconds or minutes | Designing Data-Intensive Applications (Kleppmann) — Ch. 5 |
| Latency vs Throughput | Latency = time for one request. Throughput = requests per second. Optimizing one often degrades the other. Batching improves throughput but increases latency. Parallel execution improves latency but reduces throughput efficiency | Measure p50/p95/p99 (percentiles) — not averages. Average hides tail latency. 1% of slow requests still affect 1 in 100 users. Use autocannon or k6 to benchmark |
❌ "Our average latency is 50ms" is misleading — p99 might be 5s. Tail latency causes cascading delays in systems where one request calls 3 services | Latency Numbers |
| Scalability — Vertical vs Horizontal | Vertical (Scale Up): bigger machine — more CPU, RAM. Simple, no code changes. Has a hard limit (biggest machine available). Horizontal (Scale Out): more machines — requires stateless design, load balancing, distributed state management | Vertical: fastest path to solving a bottleneck. Horizontal: required for true scale but needs: stateless services, shared session store (Redis), distributed cache, sharded DB | ❌ Premature horizontal scaling is a trap — it adds massive complexity before you need it. Optimize and scale vertically first. Most apps never need horizontal scaling | The 12-Factor App |
| Idempotency | An operation is idempotent if calling it multiple times produces the same result as calling it once. Critical for: safe retries, webhook processing, payment systems | Idempotency key: client sends unique key per operation (Idempotency-Key: uuid). Server stores key → result mapping (Redis with TTL). If same key arrives again, return cached result instead of re-executing |
❌ Non-idempotent retries in payment systems = double charges. Non-idempotent webhook handlers = duplicate order creation. Always design mutation endpoints (POST/DELETE) to be idempotent | Stripe Idempotency |
| Backpressure | When a consumer processes slower than a producer sends, the buffer fills up → memory exhaustion → crash. Backpressure signals the producer to slow down | Node.js Streams: writable.write() returns false → pause reading. Queues: bounded queue size → producer blocks when full. Rate limiting: reject excess requests with 429 |
❌ Unbounded queues grow forever under sustained overload — always set maxLength on BullMQ queues. In Kafka: consumer lag is the backpressure signal — monitor it |
Backpressure in Node.js |
| Topic | Core Concepts & Mental Model | Tools & Libraries | Key Techniques | Tradeoffs & Failure Modes | Resources |
|---|---|---|---|---|---|
| DNS (Domain Name System) | Translates domain names to IP addresses. Hierarchical: Root → TLD (.com) → Authoritative nameserver. DNS responses are cached by TTL — a lower TTL means faster propagation of changes but more DNS lookup load |
Route 53 (AWS), Cloudflare DNS, BIND. Record types: A (domain→IPv4), AAAA (IPv6), CNAME (alias), MX (email), TXT (verification), NS (nameserver) |
DNS-based load balancing (Weighted routing in Route 53), Geo-DNS (route users to nearest region), failover routing | ❌ Low TTL during migration only — high TTL normally (less DNS queries). DNS changes don't propagate instantly (old TTL must expire first). DNS is UDP-based — can be spoofed (DNSSEC mitigates this) | Route 53 docs |
| CDN (Content Delivery Network) | Globally distributed cache of static assets (JS, CSS, images, video). User requests go to nearest edge node — reduces latency from ~200ms (cross-continent) to ~10ms (edge). Two modes: Pull CDN (caches on first request), Push CDN (pre-uploaded before request) | Cloudflare (most popular), AWS CloudFront, Fastly, Akamai | Pull CDN: point DNS to CDN, set Cache-Control headers. Push CDN: upload assets to CDN origin before deploy. Cache invalidation: versioned filenames (app.[hash].js) > TTL-based invalidation |
❌ Stale CDN cache = users seeing old JS/CSS after deploy. Versioned filenames are the only reliable cache-busting strategy. CDN doesn't help for authenticated/dynamic responses — only truly cacheable content | Cloudflare docs |
| Load Balancers — L4 vs L7 | L4 (Transport): routes by IP + port. Faster, no content inspection. L7 (Application): routes by URL, headers, cookies. Enables: path-based routing, SSL termination, sticky sessions, health checks, A/B routing | Software: NGINX, HAProxy, Traefik. Cloud: AWS ALB (L7) / NLB (L4), GCP Load Balancer. Algorithms: Round Robin (default), Least Connections (unequal request durations), Weighted (heterogeneous servers), IP Hash (sticky — avoid for stateless apps) | SSL termination at LB (backend gets plain HTTP), health check endpoint (/health → remove unhealthy instance), connection draining (finish in-flight requests before removing instance) |
❌ IP hash "sticky sessions" = unequal load distribution when one IP has many users. Round robin fails when requests have very different durations (use Least Connections instead). No connection draining = dropped requests on deploy | NGINX Load Balancing |
| Reverse Proxy | Sits in front of servers. Client talks to proxy, not directly to servers. Provides: SSL termination, compression, request routing, rate limiting, caching, DDoS protection | NGINX, Caddy, Cloudflare (as global proxy) | proxy_pass to upstream, proxy_set_header X-Real-IP $remote_addr (preserve client IP), proxy_cache (static content caching), gzip at proxy level (faster than at app level) |
❌ Proxy adds one network hop — keep it on the same network as backend (not across regions). NGINX single config error = all traffic down. Test with nginx -t before reload |
NGINX docs |
| API Gateway | Entry point for all client requests in a microservices architecture. Provides: authentication, rate limiting, request routing, protocol translation, response aggregation, logging | AWS API Gateway, Kong, Traefik, NGINX, Envoy | Auth at gateway (not every service), rate limit per route, route /api/users → Users Service, /api/orders → Orders Service, request/response transformation, API versioning |
❌ API Gateway is a single point of failure — deploy in HA (multiple instances + LB). Gateway doing too much (business logic) = bottleneck. Too many hops (client → gateway → service A → service B) = latency accumulation | Kong docs |
| Topic | Core Concepts & Mental Model | Tools & Libraries | Key Techniques | Tradeoffs & Failure Modes | Resources |
|---|---|---|---|---|---|
| Replication | Copy data across multiple nodes. Primary-Replica (Master-Slave): all writes go to primary, reads can go to any replica. Multi-Primary: any node accepts writes (conflict resolution needed). Synchronous: primary waits for replica ACK before confirming write (no data loss, higher latency). Asynchronous: primary confirms without waiting (lower latency, possible data loss on crash) | PostgreSQL streaming replication, MySQL replication, MongoDB replica sets, AWS RDS Multi-AZ | Read replicas for read-heavy workloads, failover to replica on primary crash, replica lag monitoring (pg_stat_replication) |
❌ Replica lag: writes to primary, immediate read from replica = stale data (use "read-your-writes" consistency, read from primary for critical reads). Async replication data loss window on crash | PostgreSQL Replication |
| Sharding (Horizontal Partitioning) | Split a large table across multiple database instances. Each shard holds a subset of data. Hash sharding: shard = hash(userId) % numShards — even distribution. Range sharding: shard = userId range — supports range queries but hot spots. Directory sharding: lookup table maps key to shard |
MongoDB (native sharding), Vitess (MySQL sharding), PostgreSQL + Citus, AWS DynamoDB (partitioned by key) | Shard key design (most important decision — wrong key = hot shard), cross-shard queries (avoid — expensive), resharding strategy when adding shards | ❌ Hot shards: all traffic goes to one shard (e.g., sharding by createdAt • high insert rate = newest shard is hot). Cross-shard JOINs are expensive — denormalize or use application-level joins. Resharding (redistribution) is operationally painful |
Designing Data-Intensive Applications — Ch. 6 |
| Indexing at Scale | Without indexes = full table scan on every query. B-tree index handles: equality and range queries. Hash index: equality only, faster. Composite index: column order matters (most selective first). Covering index: query satisfied entirely by index (no table access) | PostgreSQL EXPLAIN ANALYZE, CREATE INDEX CONCURRENTLY (non-blocking), partial indexes (WHERE deleted_at IS NULL), GIN for arrays/JSON/full-text |
Run EXPLAIN ANALYZE before and after index. Use pg_stat_user_indexes to find unused indexes. CREATE INDEX CONCURRENTLY in production (avoids table lock) |
❌ Too many indexes = slow writes (every write updates all indexes). Index on low-cardinality column (e.g., boolean) is wasteful. Composite index (a, b) helps WHERE a = ? but NOT WHERE b = ? alone |
Use the Index, Luke |
| Connection Pooling | Each DB connection is expensive (~1-10MB RAM, TCP overhead). Without pooling, a Node.js app with 100 concurrent requests = 100 DB connections → PostgreSQL collapses (default max 100-200 connections) | PgBouncer (connection pooler for PostgreSQL), Prisma built-in pool (connection_limit), pg pool (new Pool({ max: 10 })), RDS Proxy (AWS managed) |
Pool size = (num_cores * 2) + disk_spindles (PgBouncer formula). Session vs transaction vs statement pooling modes. Monitor pool wait queue |
❌ Pool too small = requests queue up waiting for a connection. Pool too large = DB overwhelmed. PgBouncer transaction mode incompatible with prepared statements — use session mode or disable prepared statements | PgBouncer docs |
| Database Selection Guide | Choose by data model + consistency requirement + scale need — not popularity | PostgreSQL: relational, ACID, complex queries, default choice. MongoDB: document, flexible schema, rapid iteration. Cassandra/DynamoDB: wide-column, AP, high write throughput, time-series. Redis: key-value, in-memory, caching/sessions. Neo4j: graph, relationship queries. ClickHouse: OLAP, analytics, columnar. Elasticsearch: full-text search, log analysis | Start with PostgreSQL. Add Redis for caching. Add Elasticsearch for search. Only add Cassandra/Dynamo when you have genuine write-scale problems | ❌ Using MongoDB because "it's flexible" = inconsistent schema at scale. Using Cassandra because "it's fast" without understanding eventual consistency model = data bugs | Choosing a Database - Martin Fowler |
| Topic | Core Concepts & Mental Model | Tools & Libraries | Key Techniques | Tradeoffs & Failure Modes | Resources |
|---|---|---|---|---|---|
| Caching Layers | Cache exists at multiple levels — each has different scope, speed, and invalidation complexity | Browser: Cache-Control headers, service worker. CDN: Cloudflare, CloudFront — static assets + cacheable API responses. Server (in-memory): node-cache, lru-cache — single instance only. Distributed: Redis — shared across all instances. DB query cache: PostgreSQL shared buffers |
Layer order: Browser → CDN → Server memory → Redis → Database. Cache hit at any layer = don't proceed to next | ❌ Server in-memory cache in clustered apps = each instance has its own cache (inconsistent). Redis becomes a single point of failure — use Redis Sentinel or Cluster | Redis caching patterns |
| Cache-Aside (Lazy Loading) | Application manages the cache explicitly. Read: check cache → miss → read DB → write to cache → return. Write: update DB → invalidate cache key (don't update cache — it might be stale). Most common pattern | Redis + your DB | Cache key design (users:${id}, posts:${userId}:page:${page}), TTL per data type (user profile: long TTL, real-time feed: short TTL), cache warming on startup for hot data |
❌ Cache miss penalty on first request (thundering herd if cache expires under high load — add jitter to TTLs). Stale cache if DB updates without invalidating cache key. Data consistency lag between DB and cache | — |
| Write-Through | Write to cache AND DB simultaneously on every write. Cache is always in sync with DB. Read is always a cache hit (after first write). More expensive writes | Redis + ORM transaction-like wrapper | Write to both in same operation, cache miss on first read (before any write), good for write-heavy + read-heavy workloads where consistency matters | ❌ Every write hits both cache and DB — higher write latency. Cache stores data that may never be read (wasted memory). If DB write fails after cache write = inconsistency | — |
| Write-Back (Write-Behind) | Write to cache only, immediately return. Persist to DB asynchronously in the background. Fastest writes possible | Redis + background worker (BullMQ) | Batch DB writes for efficiency, reduce DB write load, use for metrics/counters/analytics where occasional data loss is acceptable | ❌ Data loss risk: if cache crashes before DB write = lost data. Complex to implement correctly. Not suitable for critical transactional data | — |
| Cache Invalidation | The hardest problem in distributed systems. When data changes in DB, cache must be invalidated. Three strategies: TTL (automatic expiry), Event-based (invalidate on write), Version keys (users:${id}:v${version}) |
redis.del(key) on write, revalidateTag() in Next.js, event-driven invalidation via message queue |
Tag-based invalidation (invalidate all keys with tag user:123), cache stampede protection (mutex/lock on cache miss), short TTL + stale-while-revalidate |
❌ Phil Karlton: "There are only two hard things in Computer Science: cache invalidation and naming things." Missing an invalidation = stale data visible to users. Mass invalidation under load = thundering herd hits DB | — |
| Topic | Core Concepts & Mental Model | Tools & Libraries | Key Techniques | Tradeoffs & Failure Modes | Resources |
|---|---|---|---|---|---|
| Message Queues | Decouple producer (writes job) from consumer (processes job). Producer returns immediately — consumer processes async. Buffer against traffic spikes — queue absorbs burst, consumer processes at its pace | BullMQ + Redis (Node.js standard, job queues), AWS SQS (managed, simple), RabbitMQ (flexible routing). Queue vs Topic: queue = one consumer per message. Topic/pub-sub = multiple consumers per message | Job priorities, delayed jobs, retries with exponential backoff, dead-letter queue (DLQ) for failed jobs, job deduplication, worker concurrency tuning | ❌ No DLQ = failed jobs silently disappear. No idempotency in consumer = duplicate processing on retry causes double-sends/double-charges. Unbounded queue = memory exhaustion under sustained overload. BullMQ requires Redis — plan for Redis HA | BullMQ docs |
| Event Streaming (Kafka) | Persistent, ordered, replayable event log. Unlike queues (message deleted after consumption), Kafka retains messages for configured retention period. Consumers track their own offset (position in log) — can replay from any point | Apache Kafka, AWS MSK (managed Kafka), Confluent Cloud, Redpanda (Kafka-compatible, faster) | Topics + Partitions (parallelism unit), Consumer Groups (each group gets all messages independently), offset management (auto.commit vs manual), Kafka Streams for real-time processing, Schema Registry for event schema evolution |
❌ Partition count is set at creation — changing requires migration. Message ordering guaranteed only within a partition. Kafka is operationally complex — use Confluent or MSK in production. Wrong partition key = all messages in one partition = no parallelism | Kafka docs |
| Event-Driven Architecture | Services communicate via events (something that happened) instead of commands (do this). Loose coupling — publisher doesn't know consumers. Enables: audit log, event replay, multiple independent consumers | Kafka, AWS EventBridge, RabbitMQ (topic exchange), Redis pub/sub (simple, no persistence) | Event schema design: { eventType, version, timestamp, payload }. Event versioning (backward-compatible changes only). Choreography (services react to events independently) vs Orchestration (central coordinator directs services) |
❌ Eventually consistent by nature — services see events at different times. Event schema changes break consumers (use schema registry + versioning). Choreography = hard to trace end-to-end flow (use distributed tracing). Orchestration = central coordinator is a coupling point | Martin Fowler — Event-Driven |
| Saga Pattern | Manage distributed transactions across multiple services without a global ACID transaction (which doesn't exist across services). Sequence of local transactions with compensating actions if something fails | Orchestration-based (Temporal, AWS Step Functions) or Choreography-based (Kafka events) | Example: Order saga — Reserve Inventory → Charge Payment → Notify Shipping. If payment fails → Release Inventory (compensating action). Each step is idempotent |
❌ Sagas are eventually consistent — intermediate states are visible. Compensating transactions are hard to design correctly (not always reversible — e.g., "send email" can't be unsent). Choreography-based sagas are hard to debug (events scattered across services) | Saga Pattern |