Back to Data Engineering Interview
Data Engineering Interview

Data Engineering Interview

23 of 73 Completed

Snowflake vs. Star Schemas

Both star and snowflake schemas have a central fact table referencing several dimension tables. The difference between these two structures arises from how their dimension tables are stored– normalized (snowflake) or denormalized (star).

Snowflake Schema

Snowflake schemas follow a tree-like structure with dimension tables connected to one another in a hierarchy.

image

For example, if our facts are sales, a buyer dimension table could hold the gender and date of birth of the buyers, as well as a foreign key referencing another dimension table that contains the buyers’ city.

For attributes with different levels of granularity, each level of the snowflake schema corresponds to a specific level of the dimension. For example, we could have the following dimension levels, each corresponding to a new table:

time dimension

Level 1 Year
Level 2 Quarter
Level 3 Month
Level 4 Week
Level 5 Day

Because normalization reduces data redundancy, snowflake schemas are generally easier to maintain and more memory efficient.

Star Schema

Star schemas are denormalized, meaning the rows in the dimension tables don’t contain any references.

As a result, this structure is only one level deep:

image

Star schemas address the issue of slow JOIN operations. We can think of denormalization as the result of carrying out the corresponding JOIN for every foreign key in our dimension tables, which speeds up queries.

In comparison to snowflake structures, the denormalized tables in star schemas take up more space in memory by storing redundant data, which also hinders maintenance with the risk of inconsistencies appearing if one instance is updated and another is not.

Good job, keep it up!

31%

Completed

You have 50 sections remaining on this learning path.