Unit 4 — fixed window, sliding window, and token buckets at scale

Rate Limiting

A single buggy integration can hammer your API with 50,000 requests a second, starving every other merchant. A credential-stuffing script can probe login endpoints relentlessly. The defense that keeps a shared platform fair and alive is the rate limiter: a sub-millisecond decision, made on every request, about whether to serve it or shed it.

This unit builds the three rate limiters every backend engineer should know — fixed window, sliding window, and token bucket — shows where the cheap one breaks, and benchmarks the trade-off between accuracy and memory.

Sub-unit 1 of 6

The product problem

Functional

Limit requests per API key per time window.
Reject over-limit requests with HTTP 429 and a Retry-After header.
Allow short bursts where the policy permits them.
Apply limits consistently across every API server.

Non-functional

A limit decision in under 1 ms.
Correct when distributed across 50+ API servers sharing state.
Handle 500,000 requests/second in aggregate.
Minimal memory per tracked key.

Constraints

State is shared (e.g. Redis) across many stateless app servers.
Clocks across servers are not perfectly synchronized.
Bursty, adversarial traffic is the norm, not the exception.

Finished reading? Mark this sub-unit complete to unlock the next.