Unit 4 — fixed window, sliding window, and token buckets at scale
Rate Limiting
A single buggy integration can hammer your API with 50,000 requests a second, starving every other merchant. A credential-stuffing script can probe login endpoints relentlessly. The defense that keeps a shared platform fair and alive is the rate limiter: a sub-millisecond decision, made on every request, about whether to serve it or shed it.
This unit builds the three rate limiters every backend engineer should know — fixed window, sliding window, and token bucket — shows where the cheap one breaks, and benchmarks the trade-off between accuracy and memory.
Sub-unit 1 of 6
The product problem
Functional
- Limit requests per API key per time window.
- Reject over-limit requests with HTTP 429 and a Retry-After header.
- Allow short bursts where the policy permits them.
- Apply limits consistently across every API server.
Non-functional
- A limit decision in under 1 ms.
- Correct when distributed across 50+ API servers sharing state.
- Handle 500,000 requests/second in aggregate.
- Minimal memory per tracked key.
Constraints
- State is shared (e.g. Redis) across many stateless app servers.
- Clocks across servers are not perfectly synchronized.
- Bursty, adversarial traffic is the norm, not the exception.
Finished reading? Mark this sub-unit complete to unlock the next.
