As asked
Explain the tradeoff between a normalized data model and a denormalized wide table in an analytics warehouse context. When would you choose each?
Sample answer outline
Normalization reduces redundancy and storage, makes updates easy, and is appropriate for OLTP workloads where rows are updated frequently. Denormalization pre-joins tables into wide flat tables, which speeds up read-heavy analytical queries because fewer joins are needed at query time. In analytics warehouses, storage is cheap and reads are the dominant workload, so some denormalization of mart tables is appropriate. However, over-denormalization creates maintenance problems when the same column appears in dozens of tables and needs updating everywhere. A strong answer discusses the Kimball star schema as the standard balance: normalized dimension tables and a denormalized fact with foreign keys.
Expect these follow-ups
- What is a star schema and how does it differ from a snowflake schema?
- When would you build a flat wide table instead of a star schema?