qodebase logoqodebase
← qodebase

Unit 2 — geo-indexing, k-d trees, quad-trees, H3, and ETA ranking

Nearest Drivers & Spatial Indexing

A rider drops a pin. Within ~50 milliseconds you must answer a deceptively simple question: which drivers should we even consider for this trip? Thousands of cars are online in the metro, the answer changes every second as they move, and "closest" is a trap — the car 200 m away may be across a bay with no bridge, while the genuinely fastest pickup is a kilometre down a highway on-ramp.

This unit builds the candidate-retrieval layer that sits between routing (Unit 1) and dispatch (Unit 3). You will implement four things from scratch — a uniform-grid filter, a k-d tree, a quad-tree, and a two-stage ETA pipeline — run them against fixtures with python test.py, and answer deterministic checkpoints to unlock each step.

Sub-unit 1 of 23

The problem: candidate retrieval for dispatch

The job is candidate generation, not the final assignment. Given a pickup point and a live fleet of thousands, return the top-kk drivers most likely to give a short pickup — ranked by road ETA, not straight-line distance — fast enough to run on every single request, continuously, across every city.

Functional

  • Return the top-k drivers for a pickup point, ranked by pickup ETA.
  • Reflect live driver positions, updated every few seconds.
  • Exclude unreachable drivers (wrong side of a barrier, disconnected component).
  • Hand a small, ranked candidate set to the dispatch layer.

Non-functional

  • p99 under 50 ms for the whole retrieval.
  • Scale to 10k+ online drivers per city.
  • Cheap enough to run on every rider request.
  • Degrade gracefully — never scan the whole fleet on the hot path.

Constraints

  • A full shortest-path query per driver is far too slow.
  • Driver positions are a high-churn stream.
  • Straight-line distance does not equal road travel time.

The tension is the same one from Unit 1, one layer up: the fleet is large, the query budget is tiny, and the volume is enormous. The resolution is a two-stage funnel — a microsecond spatial filter that throws away 99% of the fleet, followed by an exact ETA ranking on the survivors.

candidates=GeoFilter(D,p,r)microseconds    top k    ETARank(candidates,p)milliseconds\text{candidates} = \underbrace{\text{GeoFilter}(D,\, p,\, r)}_{\text{microseconds}} \;\xrightarrow{\;\text{top } k'\;}\; \underbrace{\text{ETARank}(\text{candidates},\, p)}_{\text{milliseconds}}

Everything in this unit is about building those two stages well.

Finished reading? Mark this sub-unit complete to unlock the next.