Snowflake vs. Star Schemas
Both star and snowflake schemas have a central fact table referencing several dimension tables. The difference between these two structures arises from how their dimension tables are stored– normalized (snowflake) or denormalized (star).
Snowflake Schema
Snowflake schemas follow a tree-like structure with dimension tables connected to one another in a hierarchy.
For example, if our facts are sales, a buyer dimension table could hold the gender and date of birth of the buyers, as well as a foreign key referencing another dimension table that contains the buyers’ city.
For attributes with different levels of granularity, each level of the snowflake schema corresponds to a specific level of the dimension. For example, we could have the following dimension levels, each corresponding to a new table:
time dimension
Level 1 | Year |
---|---|
Level 2 | Quarter |
Level 3 | Month |
Level 4 | Week |
Level 5 | Day |
Because normalization reduces data redundancy, snowflake schemas are generally easier to maintain and more memory efficient.
Star Schema
Star schemas are denormalized, meaning the rows in the dimension tables don’t contain any references.
As a result, this structure is only one level deep:
Star schemas address the issue of slow JOIN
operations. We can think of denormalization as the result of carrying out the corresponding JOIN
for every foreign key in our dimension tables, which speeds up queries.
In comparison to snowflake structures, the denormalized tables in star schemas take up more space in memory by storing redundant data, which also hinders maintenance with the risk of inconsistencies appearing if one instance is updated and another is not.
31%
CompletedYou have 50 sections remaining on this learning path.