Unit 5 — feature engineering and tree models under 50ms
Fraud Detection & Risk Scoring
Before a charge from Unit 1 hits the card network, Stripe has to decide in under 50 milliseconds whether to let it through, challenge it, or block it. A stolen card can be tested in seconds; a legitimate customer on vacation looks suspicious if you only check the amount. The system that makes this call is a risk scorer — and the gap between a naive rule and a tree-based model is the difference between blocking real customers and letting fraud through.
This unit builds the feature pipeline, walks through decision trees and their ensembles, and benchmarks inference latency against accuracy.
Sub-unit 1 of 7
The product problem
Functional
- Score every transaction before authorization as approve, review, or block.
- Explain which features drove the decision (for disputes and model debugging).
- Update scores as new signals arrive (velocity, device, geography).
Non-functional
- Under 50 ms p99 inference latency per decision.
- 99.9% uptime — a scorer outage blocks all payments.
- Minimize false positives (rejecting legitimate transactions).
- Catch fraud before funds move (pre-authorization only).
Constraints
- Features must be computable from data available at authorization time.
- Models must be explainable enough for human review.
- Training data is heavily imbalanced (fraud is rare).
Finished reading? Mark this sub-unit complete to unlock the next.
