Interview Query | Dimension And Fact Tables - Data Engineering Interview

Data Engineering Interview

23 of 73 Completed

Dimension And Fact Tables

Dimensional modeling starts with dividing the data into fact and dimension tables.

Fact Tables

Fact tables store the numbers we’re interested in, and dimension tables contain the categories, or dimensions, that determine how the data is aggregated within our queries.

Facts represent the events that occur in the specific business process we’re recording. Each row in a fact table holds the data measured for that event. For example, if the events represent sales, the fact table could include the number of items sold, the cost of each item, and the profits made in the sale.

Rows in fact tables also include foreign keys to rows in dimension tables that represent the dimensions for each fact. These foreign keys are small and unique integers that remain constant and are not intended for user recall. They shouldn’t convey any meaning or require any complex processes to create them (such as joining natural keys together).

Dimension Tables

Dimensions are commonly described as the WHO, WHAT, WHEN, and WHERE for each fact.

For example, some dimensions of a sale could include:

Who the buyer is
What they bought
When the sale happened
Where the sale took place

Each dimension consists of more than a single value. In order for dimensional modeling to speed up analytics, each row in a dimension table must hold all the information we need to aggregate metrics efficiently.

For example, a row in the time dimension won’t only include the DATETIME in which the sale happened. It would also include the DAY, MONTH, YEAR, HOUR, and any other time units of interest. If we wanted to know how many sales were made at each hour, we could group values by a single column instead of processing each DATETIME value, making querying faster and more efficient.

This creates a hierarchy of many-to-one relationships: there are many hours in a day, many days in a month, etc. Each of these many-to-one relationships defines the levels of the dimension, or more precisely, the levels of granularity.

In general, each row in a dimension table includes every attribute we’re interested in for analytics in every level of granularity we need. So, if we’re interested in how much people buy on average depending on the year and decade they were born in, we would include both values separately.

After dimensional modeling, we’re left with a fact table referencing several dimension tables.

The resulting structure can have different shapes depending on the dimension tables, which could be normalized or denormalized. This creates the two main data warehouse schemas in dimensional modeling: the snowflake and star schemas.

We’ll take a closer look at these structures in the following section.

Introduction to Dimensional Modeling

Snowflake vs. Star Schemas

Good job, keep it up!

31%

Completed

You have 50 sections remaining on this learning path.