qodebase logoqodebase
← qodebase

Unit 4 — fixed window, sliding window, and token buckets at scale

Rate Limiting

A single buggy integration can hammer your API with 50,000 requests a second, starving every other merchant. A credential-stuffing script can probe login endpoints relentlessly. The defense that keeps a shared platform fair and alive is the rate limiter: a sub-millisecond decision, made on every request, about whether to serve it or shed it.

This unit builds the three rate limiters every backend engineer should know — fixed window, sliding window, and token bucket — shows where the cheap one breaks, and benchmarks the trade-off between accuracy and memory.

Sub-unit 1 of 6

The product problem

Functional

  • Limit requests per API key per time window.
  • Reject over-limit requests with HTTP 429 and a Retry-After header.
  • Allow short bursts where the policy permits them.
  • Apply limits consistently across every API server.

Non-functional

  • A limit decision in under 1 ms.
  • Correct when distributed across 50+ API servers sharing state.
  • Handle 500,000 requests/second in aggregate.
  • Minimal memory per tracked key.

Constraints

  • State is shared (e.g. Redis) across many stateless app servers.
  • Clocks across servers are not perfectly synchronized.
  • Bursty, adversarial traffic is the norm, not the exception.
Finished reading? Mark this sub-unit complete to unlock the next.