Using Surrogate Keys to load the Data Warehouse Dimensions

No Comments

Written by Luis Miguel – Originally Published on Sept. 14, 2014

One of the most practical reasons for the use of surrogate keys when loading dimensions in a data warehouse is the need to aggregate data from disparate systems.

Implementing a data warehouse works well because it brings together data from two or more systems that exist independent of each other but need to be analyzed together.

Very frequently, companies acquire other businesses or are a conglomerate of small companies that operate independently. Modern data management techniques allow us to see a global perspective through the use of a data warehouse.

One of the challenges for a data warehouse architect is to build a system that homogenizes those autonomous data sources. In the example I use in this post, we see that a holdings company owns two paint distribution/retail operations. One of the entities in both of the transactional systems is named “colors”, and color is a dimension in the data warehouse because we need to analyze sales and other performance indicators by color.

The color tables in each of the transactional systems uses a primary key (color_id); in this case, the primary keys are integers. As expected, these keys cannot be reused in the dimension because they overlap/conflict, but we need to resolve this inconsistency to populate the color dimension and the sales fact table.

To demonstrate, the database for the Colorama store has a color named “White” with a primary key of “1”, while the primary key for “White” in the Paintmart system is “2”.

In this very simple example I’m going to use an intermediate “keymap” table where I’m going to do the mapping (and translation) from the transactional systems’ primary keys (color_id) to the dimension table surrogate key (dim_color_key), so that we have a consistent unique key for every color no matter where it comes from. We then use this surrogate key in both the color dimension and the sales fact tables.

In the COLORS table for the transactional system of Colorama; Red is color_id 4

In the COLORS table for the transactional system of Paintmart; Red is color_id 3

In dim_color in the data warehouse the surrogate key for Red is 4

The keymap table in the data warehouse does the translation from color_id in the transactional entities into dim_color_key in the fact table

detail_id 1-2 and 23-1 are sales of Red paint

line_id 5-2, 11-1, and 15-1 are sales of Red paint


Observe how in fact_sales all the rows for sales of Red paint have been translated into the surrogate key dim_color_key with a value of 4

This same technique can be used to do the translation of other entity to dimension data, for example in this scenario the same situation is present in the “size” table.

Powered by Netfirms