As asked
We need to score transactions for fraud within 50ms. Features include user 30-day spend history, merchant risk score, and device fingerprint. Design a feature store that serves these features with low latency.
Sample answer outline
The candidate should separate the offline store (batch-computed features in a data warehouse or object store) from the online store (low-latency key-value retrieval via Redis or DynamoDB). A feature pipeline computes aggregates in batch (30-day spend) and writes to both stores; a point-in-time correct training dataset is assembled from the offline store using the event timestamp. They should discuss cache invalidation, TTLs for spend aggregations, and how to handle cold starts for new users.
Expect these follow-ups
- How do you ensure training-serving skew does not occur when the online store feature computation differs from the batch pipeline?