Backend System Design

Mental Model: System design is about making intentional decisions under constraints. Every choice is a tradeoff — consistency vs availability, latency vs throughput, simplicity vs scalability. The goal is not the "perfect" architecture but the right architecture for your scale, team size, and problem. Start simple. Add complexity only when you hit a real bottleneck.

Fundamentals — Concepts Every Engineer Must Know

Topic	Core Concepts & Mental Model	Key Techniques	Tradeoffs & Failure Modes	Resources
CAP Theorem	In a distributed system, during a network partition you must choose between Consistency (all nodes return the same data) or Availability (every request gets a response). Partition tolerance is not optional in real networks — you always have it. The real choice is CP vs AP	CP systems (PostgreSQL, HBase, Zookeeper): reject writes during partition to stay consistent. AP systems (Cassandra, DynamoDB, CouchDB): accept writes during partition and sync later (eventual consistency)	❌ Most engineers misapply CAP. "CA" systems don't exist in distributed deployments — if there's a network, there are partitions. PACELC is a better model: even without partitions, you still trade latency (L) vs consistency (C)	Jepsen DB Tests, PACELC Paper
ACID vs BASE	ACID (Relational DBs): Atomicity (all-or-nothing), Consistency (valid state after tx), Isolation (tx don't interfere), Durability (committed data survives crash). BASE (NoSQL): Basically Available, Soft state, Eventually consistent — trades guarantees for scale	ACID: PostgreSQL, MySQL, SQLite — strong guarantees, best for financial/transactional data. BASE: Cassandra, DynamoDB — high write throughput, global distribution, tolerates inconsistency	❌ "NoSQL scales, SQL doesn't" is a myth. PostgreSQL handles millions of requests with proper indexing and connection pooling. Choose by consistency requirement, not hype	PostgreSQL ACID
Consistency Models	A spectrum from strong to weak. Stronger = safer but slower. Weaker = faster but complex application logic	Strong: every read returns latest write (expensive — requires coordination). Read-your-writes: you always see your own updates (session-level). Monotonic read: never read older data than you've already seen. Eventual: all nodes converge eventually (cheapest)	❌ Eventual consistency is often misunderstood — it doesn't mean "usually consistent." It means "will be consistent eventually if no new updates." Window of inconsistency can be milliseconds or minutes	Designing Data-Intensive Applications (Kleppmann) — Ch. 5
Latency vs Throughput	Latency = time for one request. Throughput = requests per second. Optimizing one often degrades the other. Batching improves throughput but increases latency. Parallel execution improves latency but reduces throughput efficiency	Measure p50/p95/p99 (percentiles) — not averages. Average hides tail latency. 1% of slow requests still affect 1 in 100 users. Use `autocannon` or `k6` to benchmark	❌ "Our average latency is 50ms" is misleading — p99 might be 5s. Tail latency causes cascading delays in systems where one request calls 3 services	Latency Numbers
Scalability — Vertical vs Horizontal	Vertical (Scale Up): bigger machine — more CPU, RAM. Simple, no code changes. Has a hard limit (biggest machine available). Horizontal (Scale Out): more machines — requires stateless design, load balancing, distributed state management	Vertical: fastest path to solving a bottleneck. Horizontal: required for true scale but needs: stateless services, shared session store (Redis), distributed cache, sharded DB	❌ Premature horizontal scaling is a trap — it adds massive complexity before you need it. Optimize and scale vertically first. Most apps never need horizontal scaling	The 12-Factor App
Idempotency	An operation is idempotent if calling it multiple times produces the same result as calling it once. Critical for: safe retries, webhook processing, payment systems	Idempotency key: client sends unique key per operation (`Idempotency-Key: uuid`). Server stores key → result mapping (Redis with TTL). If same key arrives again, return cached result instead of re-executing	❌ Non-idempotent retries in payment systems = double charges. Non-idempotent webhook handlers = duplicate order creation. Always design mutation endpoints (POST/DELETE) to be idempotent	Stripe Idempotency
Backpressure	When a consumer processes slower than a producer sends, the buffer fills up → memory exhaustion → crash. Backpressure signals the producer to slow down	Node.js Streams: `writable.write()` returns `false` → pause reading. Queues: bounded queue size → producer blocks when full. Rate limiting: reject excess requests with `429`	❌ Unbounded queues grow forever under sustained overload — always set `maxLength` on BullMQ queues. In Kafka: consumer lag is the backpressure signal — monitor it	Backpressure in Node.js

Networking & Traffic Management

Topic	Core Concepts & Mental Model	Tools & Libraries	Key Techniques	Tradeoffs & Failure Modes	Resources
DNS (Domain Name System)	Translates domain names to IP addresses. Hierarchical: Root → TLD (`.com`) → Authoritative nameserver. DNS responses are cached by TTL — a lower TTL means faster propagation of changes but more DNS lookup load	Route 53 (AWS), Cloudflare DNS, BIND. Record types: `A` (domain→IPv4), `AAAA` (IPv6), `CNAME` (alias), `MX` (email), `TXT` (verification), `NS` (nameserver)	DNS-based load balancing (Weighted routing in Route 53), Geo-DNS (route users to nearest region), failover routing	❌ Low TTL during migration only — high TTL normally (less DNS queries). DNS changes don't propagate instantly (old TTL must expire first). DNS is UDP-based — can be spoofed (DNSSEC mitigates this)	Route 53 docs
CDN (Content Delivery Network)	Globally distributed cache of static assets (JS, CSS, images, video). User requests go to nearest edge node — reduces latency from ~200ms (cross-continent) to ~10ms (edge). Two modes: Pull CDN (caches on first request), Push CDN (pre-uploaded before request)	Cloudflare (most popular), AWS CloudFront, Fastly, Akamai	Pull CDN: point DNS to CDN, set `Cache-Control` headers. Push CDN: upload assets to CDN origin before deploy. Cache invalidation: versioned filenames (`app.[hash].js`) > TTL-based invalidation	❌ Stale CDN cache = users seeing old JS/CSS after deploy. Versioned filenames are the only reliable cache-busting strategy. CDN doesn't help for authenticated/dynamic responses — only truly cacheable content	Cloudflare docs
Load Balancers — L4 vs L7	L4 (Transport): routes by IP + port. Faster, no content inspection. L7 (Application): routes by URL, headers, cookies. Enables: path-based routing, SSL termination, sticky sessions, health checks, A/B routing	Software: NGINX, HAProxy, Traefik. Cloud: AWS ALB (L7) / NLB (L4), GCP Load Balancer. Algorithms: Round Robin (default), Least Connections (unequal request durations), Weighted (heterogeneous servers), IP Hash (sticky — avoid for stateless apps)	SSL termination at LB (backend gets plain HTTP), health check endpoint (`/health` → remove unhealthy instance), connection draining (finish in-flight requests before removing instance)	❌ IP hash "sticky sessions" = unequal load distribution when one IP has many users. Round robin fails when requests have very different durations (use Least Connections instead). No connection draining = dropped requests on deploy	NGINX Load Balancing
Reverse Proxy	Sits in front of servers. Client talks to proxy, not directly to servers. Provides: SSL termination, compression, request routing, rate limiting, caching, DDoS protection	NGINX, Caddy, Cloudflare (as global proxy)	`proxy_pass` to upstream, `proxy_set_header X-Real-IP $remote_addr` (preserve client IP), `proxy_cache` (static content caching), gzip at proxy level (faster than at app level)	❌ Proxy adds one network hop — keep it on the same network as backend (not across regions). NGINX single config error = all traffic down. Test with `nginx -t` before reload	NGINX docs
API Gateway	Entry point for all client requests in a microservices architecture. Provides: authentication, rate limiting, request routing, protocol translation, response aggregation, logging	AWS API Gateway, Kong, Traefik, NGINX, Envoy	Auth at gateway (not every service), rate limit per route, route `/api/users` → Users Service, `/api/orders` → Orders Service, request/response transformation, API versioning	❌ API Gateway is a single point of failure — deploy in HA (multiple instances + LB). Gateway doing too much (business logic) = bottleneck. Too many hops (client → gateway → service A → service B) = latency accumulation	Kong docs

Database Architecture at Scale

Topic	Core Concepts & Mental Model	Tools & Libraries	Key Techniques	Tradeoffs & Failure Modes	Resources
Replication	Copy data across multiple nodes. Primary-Replica (Master-Slave): all writes go to primary, reads can go to any replica. Multi-Primary: any node accepts writes (conflict resolution needed). Synchronous: primary waits for replica ACK before confirming write (no data loss, higher latency). Asynchronous: primary confirms without waiting (lower latency, possible data loss on crash)	PostgreSQL streaming replication, MySQL replication, MongoDB replica sets, AWS RDS Multi-AZ	Read replicas for read-heavy workloads, failover to replica on primary crash, replica lag monitoring (`pg_stat_replication`)	❌ Replica lag: writes to primary, immediate read from replica = stale data (use "read-your-writes" consistency, read from primary for critical reads). Async replication data loss window on crash	PostgreSQL Replication
Sharding (Horizontal Partitioning)	Split a large table across multiple database instances. Each shard holds a subset of data. Hash sharding: `shard = hash(userId) % numShards` — even distribution. Range sharding: `shard = userId range` — supports range queries but hot spots. Directory sharding: lookup table maps key to shard	MongoDB (native sharding), Vitess (MySQL sharding), PostgreSQL + Citus, AWS DynamoDB (partitioned by key)	Shard key design (most important decision — wrong key = hot shard), cross-shard queries (avoid — expensive), resharding strategy when adding shards	❌ Hot shards: all traffic goes to one shard (e.g., sharding by `createdAt` • high insert rate = newest shard is hot). Cross-shard JOINs are expensive — denormalize or use application-level joins. Resharding (redistribution) is operationally painful	Designing Data-Intensive Applications — Ch. 6
Indexing at Scale	Without indexes = full table scan on every query. B-tree index handles: equality and range queries. Hash index: equality only, faster. Composite index: column order matters (most selective first). Covering index: query satisfied entirely by index (no table access)	PostgreSQL `EXPLAIN ANALYZE`, `CREATE INDEX CONCURRENTLY` (non-blocking), partial indexes (`WHERE deleted_at IS NULL`), GIN for arrays/JSON/full-text	Run `EXPLAIN ANALYZE` before and after index. Use `pg_stat_user_indexes` to find unused indexes. `CREATE INDEX CONCURRENTLY` in production (avoids table lock)	❌ Too many indexes = slow writes (every write updates all indexes). Index on low-cardinality column (e.g., `boolean`) is wasteful. Composite index `(a, b)` helps `WHERE a = ?` but NOT `WHERE b = ?` alone	Use the Index, Luke
Connection Pooling	Each DB connection is expensive (~1-10MB RAM, TCP overhead). Without pooling, a Node.js app with 100 concurrent requests = 100 DB connections → PostgreSQL collapses (default max 100-200 connections)	PgBouncer (connection pooler for PostgreSQL), Prisma built-in pool (`connection_limit`), `pg` pool (`new Pool({ max: 10 })`), RDS Proxy (AWS managed)	Pool size = `(num_cores * 2) + disk_spindles` (PgBouncer formula). Session vs transaction vs statement pooling modes. Monitor pool wait queue	❌ Pool too small = requests queue up waiting for a connection. Pool too large = DB overwhelmed. PgBouncer transaction mode incompatible with prepared statements — use session mode or disable prepared statements	PgBouncer docs
Database Selection Guide	Choose by data model + consistency requirement + scale need — not popularity	PostgreSQL: relational, ACID, complex queries, default choice. MongoDB: document, flexible schema, rapid iteration. Cassandra/DynamoDB: wide-column, AP, high write throughput, time-series. Redis: key-value, in-memory, caching/sessions. Neo4j: graph, relationship queries. ClickHouse: OLAP, analytics, columnar. Elasticsearch: full-text search, log analysis	Start with PostgreSQL. Add Redis for caching. Add Elasticsearch for search. Only add Cassandra/Dynamo when you have genuine write-scale problems	❌ Using MongoDB because "it's flexible" = inconsistent schema at scale. Using Cassandra because "it's fast" without understanding eventual consistency model = data bugs	Choosing a Database - Martin Fowler

Caching Architecture

Topic	Core Concepts & Mental Model	Tools & Libraries	Key Techniques	Tradeoffs & Failure Modes	Resources
Caching Layers	Cache exists at multiple levels — each has different scope, speed, and invalidation complexity	Browser: `Cache-Control` headers, service worker. CDN: Cloudflare, CloudFront — static assets + cacheable API responses. Server (in-memory): `node-cache`, `lru-cache` — single instance only. Distributed: Redis — shared across all instances. DB query cache: PostgreSQL shared buffers	Layer order: Browser → CDN → Server memory → Redis → Database. Cache hit at any layer = don't proceed to next	❌ Server in-memory cache in clustered apps = each instance has its own cache (inconsistent). Redis becomes a single point of failure — use Redis Sentinel or Cluster	Redis caching patterns
Cache-Aside (Lazy Loading)	Application manages the cache explicitly. Read: check cache → miss → read DB → write to cache → return. Write: update DB → invalidate cache key (don't update cache — it might be stale). Most common pattern	Redis + your DB	Cache key design (`users:${id}`, `posts:${userId}:page:${page}`), TTL per data type (user profile: long TTL, real-time feed: short TTL), cache warming on startup for hot data	❌ Cache miss penalty on first request (thundering herd if cache expires under high load — add jitter to TTLs). Stale cache if DB updates without invalidating cache key. Data consistency lag between DB and cache	—
Write-Through	Write to cache AND DB simultaneously on every write. Cache is always in sync with DB. Read is always a cache hit (after first write). More expensive writes	Redis + ORM transaction-like wrapper	Write to both in same operation, cache miss on first read (before any write), good for write-heavy + read-heavy workloads where consistency matters	❌ Every write hits both cache and DB — higher write latency. Cache stores data that may never be read (wasted memory). If DB write fails after cache write = inconsistency	—
Write-Back (Write-Behind)	Write to cache only, immediately return. Persist to DB asynchronously in the background. Fastest writes possible	Redis + background worker (BullMQ)	Batch DB writes for efficiency, reduce DB write load, use for metrics/counters/analytics where occasional data loss is acceptable	❌ Data loss risk: if cache crashes before DB write = lost data. Complex to implement correctly. Not suitable for critical transactional data	—
Cache Invalidation	The hardest problem in distributed systems. When data changes in DB, cache must be invalidated. Three strategies: TTL (automatic expiry), Event-based (invalidate on write), Version keys (`users:${id}:v${version}`)	`redis.del(key)` on write, `revalidateTag()` in Next.js, event-driven invalidation via message queue	Tag-based invalidation (invalidate all keys with tag `user:123`), cache stampede protection (mutex/lock on cache miss), short TTL + stale-while-revalidate	❌ Phil Karlton: "There are only two hard things in Computer Science: cache invalidation and naming things." Missing an invalidation = stale data visible to users. Mass invalidation under load = thundering herd hits DB	—

Async Processing & Messaging

Topic	Core Concepts & Mental Model	Tools & Libraries	Key Techniques	Tradeoffs & Failure Modes	Resources
Message Queues	Decouple producer (writes job) from consumer (processes job). Producer returns immediately — consumer processes async. Buffer against traffic spikes — queue absorbs burst, consumer processes at its pace	BullMQ + Redis (Node.js standard, job queues), AWS SQS (managed, simple), RabbitMQ (flexible routing). Queue vs Topic: queue = one consumer per message. Topic/pub-sub = multiple consumers per message	Job priorities, delayed jobs, retries with exponential backoff, dead-letter queue (DLQ) for failed jobs, job deduplication, worker concurrency tuning	❌ No DLQ = failed jobs silently disappear. No idempotency in consumer = duplicate processing on retry causes double-sends/double-charges. Unbounded queue = memory exhaustion under sustained overload. BullMQ requires Redis — plan for Redis HA	BullMQ docs
Event Streaming (Kafka)	Persistent, ordered, replayable event log. Unlike queues (message deleted after consumption), Kafka retains messages for configured retention period. Consumers track their own offset (position in log) — can replay from any point	Apache Kafka, AWS MSK (managed Kafka), Confluent Cloud, Redpanda (Kafka-compatible, faster)	Topics + Partitions (parallelism unit), Consumer Groups (each group gets all messages independently), offset management (`auto.commit` vs manual), Kafka Streams for real-time processing, Schema Registry for event schema evolution	❌ Partition count is set at creation — changing requires migration. Message ordering guaranteed only within a partition. Kafka is operationally complex — use Confluent or MSK in production. Wrong partition key = all messages in one partition = no parallelism	Kafka docs
Event-Driven Architecture	Services communicate via events (something that happened) instead of commands (do this). Loose coupling — publisher doesn't know consumers. Enables: audit log, event replay, multiple independent consumers	Kafka, AWS EventBridge, RabbitMQ (topic exchange), Redis pub/sub (simple, no persistence)	Event schema design: `{ eventType, version, timestamp, payload }`. Event versioning (backward-compatible changes only). Choreography (services react to events independently) vs Orchestration (central coordinator directs services)	❌ Eventually consistent by nature — services see events at different times. Event schema changes break consumers (use schema registry + versioning). Choreography = hard to trace end-to-end flow (use distributed tracing). Orchestration = central coordinator is a coupling point	Martin Fowler — Event-Driven
Saga Pattern	Manage distributed transactions across multiple services without a global ACID transaction (which doesn't exist across services). Sequence of local transactions with compensating actions if something fails	Orchestration-based (Temporal, AWS Step Functions) or Choreography-based (Kafka events)	Example: Order saga — `Reserve Inventory → Charge Payment → Notify Shipping`. If payment fails → `Release Inventory` (compensating action). Each step is idempotent	❌ Sagas are eventually consistent — intermediate states are visible. Compensating transactions are hard to design correctly (not always reversible — e.g., "send email" can't be unsent). Choreography-based sagas are hard to debug (events scattered across services)	Saga Pattern