Unit 4 — routing baselines, residual post-processing, and a stable countdown
Pickup & Trip ETA Prediction
Routing (Unit 1) answers what path? This unit answers the harder question the rider actually stares at: how long will that path take, right now? The same road takes four minutes at 2 a.m. and eleven at 8:30. A great ETA sets expectations, feeds dispatch scoring, and anchors the fare — and a jittery, wrong one erodes trust faster than almost anything else in the app.
Uber frames ETA as a hybrid problem: a physical routing model produces a baseline, then a machine-learned model predicts the residual between that baseline and reality. You will build exactly that ladder — baseline → congestion model → residual post-processing → smoothed display — implementing each piece, checking it with python test.py, and unlocking the next with a deterministic checkpoint.
Sub-unit 1 of 20
The problem: predicting arrival time
Produce accurate pickup ETA (driver → rider) and trip ETA (pickup → destination), keep them current as traffic and position change, and present them as a stable countdown — all within a few milliseconds, because this is the highest-QPS prediction in the whole system.
Functional
- Estimate pickup and trip ETA for any origin/destination pair.
- Update the estimate as the driver moves and traffic shifts.
- Feed ETAs to dispatch (Unit 3), pricing (Unit 5), and matching (Unit 6).
- Present a smooth, non-flickering countdown to the rider.
Non-functional
- Return an ETA within a few milliseconds (highest-QPS model at Uber).
- Mean absolute error within a small fraction of true duration across the day.
- Stable display — no large jumps between refreshes.
- Continuously recalibrate from completed trips.
Constraints
- True segment speeds vary by time of day, weather, and incidents.
- Live probe data is noisy and arrives with delay.
- The exact route a driver takes is not known in advance.
- The map is a model — it cannot perfectly capture conditions on the ground.
That last constraint is the crux. As Uber's Maps team puts it, the map is not the terrain: a road graph is a model, and even a perfect shortest-path query returns an ETA conditioned on a route the rider and driver may not actually take. The entire unit is about closing the gap between the graph's answer and the real world.
