Back to Data Engineering Interview
Data Engineering Interview

Data Engineering Interview

23 of 73 Completed

Dimensional Modeling Framework

Dimensional modeling begins with a business process and ends with either a star or snowflake schema.

The general framework for answering dimensional modeling interview questions consists of five steps:

1. Identifying The Business Process

The first step in dimensional modeling is to identify the business process within the prompt clearly. This involves describing the process from a low-level perspective in terms of the core actions that make up the process, such as taking an order, creating an invoice, receiving a payment, etc.

This description needs to be as concrete as possible, as the rest of the design process will be centered around each of these core events or facts.

Although it’s easy to get caught up in the details of determining a specific schema, it’s important to always start out with a very clear idea of the business process in order to build the most optimal data warehouse.

2. Declaring The Granularity of The Model

Each dimension table can hold data with different levels of granularity depending on how the data will be aggregated. For example, we could store locations as countries, but states, cities, neighborhoods, or even full addresses are other options to consider. Dimension tables can hold more than one of these attributes if we want to perform analytics over different aggregation levels.

The same considerations must be taken into account for the events that are recorded. What granularity can be used to describe actions? What are the atomic events of the business process? (in other words, what are the “smallest” events we want to analyze?). It is possible to describe a process with different levels of detail, so you will need to find the description that works best in the context of the problem. For example, you can consider a sale to be a single event, or you can divide it into smaller events (taking an order, creating an invoice, etc…) depending on the level of detail your problem requires. The “smallest” events you wish to consider will be your atomic events.

In this step of the framework, it’s important to identify the lowest level of granularity that’s needed for carrying out future business decisions. You should have a clear idea of the atomic events that will be recorded, as well as the level of detail used to describe each of their attributes.

3. Identifying Dimensions

This step involves defining what each dimension represents– the WHO,WHAT, WHEN, and WHERE associated with each of the events you want to record. For example, if we’re looking at sales, important dimensions could include the employee who carried out the sale, the sold product, the customer, the date of the sale, etc.

For each dimension, determine the attributes that will be significant when carrying out queries. For example, we could want to know the gender and date of birth of customers, as well as their address and their subscription status to the company’s newsletter for the WHO dimension.

4. Identifying Facts

Each row in the fact table represents an atomic event and will reference each of the dimension tables you’ve identified.

To complete our fact table, you’ll need to determine the numbers you want to measure for each of the events. What are you measuring? What is the quantifiable information you need to store?

Going back to the sales example, your facts could include the value of each sale, the commission made by the sellers, the cost of the product sold, and the profits made by the sale.

5. Building A Schema

Once you identify the facts and dimensions, you’ll need to decide whether to organize them into a star or snowflake schema. To do this, we’ll evaluate our business process to determine which schema will be more convenient.

If we have old data that needs to be updated and maintained or large amounts of stored data, the normalization provided by Snowflake schemas will be more helpful. However, if we need to perform frequent queries or if there are requirements for fast querying speed, star schemas may be the best option.

Good job, keep it up!

31%

Completed

You have 50 sections remaining on this learning path.