SQL Server Slowly Changing Dimensions
Pre-requisite: Understand what a dimension in a datawarehouse means
Nothing in life is for permanent. The same applies to the underlying data at your data warehouse or data marts. In the following text I wish to highlight one of the brilliant aspects of data upserts (INSERT and/or UPDATE). The information provided will elaborate more about the Slowly Changing Dimension or commonly attributed as SCD and its types.
To begin with, there exist 6 different types of SCD implementation that are possible on any enterprise DW implementation. Though, it is rare to experience a SCD type beyond type 3.
Let’s take this in parts, so as to not burden you with lots and lots of information. In the part 1 of the series I will take Type 0, Type 1, Type 2 and Type 3.
In the follow up section, I plan to take Type 4, Type 6 and lastly the Combining type.
Type | Description |
Type 0 | Fixed. No history preservation |
Type 1 | Overwrite old data with new data |
Type 2 | Create new record by tuple versioning |
Type 3 | Maintain change data in a separate column |
Type 4 | Maintain change in a separate history table |
Type 6 | Concept derived using Type 1,2 and 3 |
Combining Type | A combination of transformation on the underlying dimension data |
Type 0
The dimension data is assumed to be fixed i.e. the data will never change. Any change data will be ignored and will not be validated.
Specific scenario:
Earlier:
Natural Key | Sales Person ID | First Name | Last Name | State |
S123456 | 1234 | Raunak | Jhawar | MH |
Now:
Natural Key | Sales Person ID | First Name | Last Name | State |
S123456 | 1234 | Raunak | Jhawar | UP |
If the ‘State’ column is following Type 0 SCD transformation, the value of State will continue to remain as ‘MH’, irrespective of its new value i.e. ‘UP’
In lighter terms, any updates are ignored and the data will new be versioned or updated.
Expert comment: This approach for data management is not advisable. The data in the warehouse is bound to change over a period of time. The underlying DW implementation must be designed in a manner to accommodate any/all changes.
Type 1
The dimension data is fixed. But any or all changes will be validated and not ignored as in Type 0.
Specific scenario:
Earlier:
Natural Key | Sales Person ID | First Name | Last Name | State |
S123456 | 1234 | Raunak | Jhawar | MH |
Now:
Natural Key | Sales Person ID | First Name | Last Name | State |
S123456 | 1234 | Raunak | Jhawar | UP |
If the ‘State’ column is following Type 1 SCD transformation, the value of State will no longer continue to remain as ‘MH’, its new value will now be ‘UP’
In lighter terms, any updates are not ignored and the data will new be updated.
Expert comment: This approach for data management is better than Type 0, but cannot be graded as the best approach. If you see the tabular representation, earlier the state had a value of ‘MH’ which was updated to ‘UP’. Once the column is updated it will appear to the user that the current value is the only value the data represented, which is not the case.
Type 2
Pre-requisite: Understand what a surrogate key means and its practical usability.
Surrogate Key: A random meaning-less integer value derived from the ‘Natural Key’. This key is used as a reference to make table joins between the dimension table and the fact table(s) of the OLAP system.
The dimension data is not updated as in Type 1, which forced to lose all history of the underlying data. Here, a new dimension row is inserted into the dimension table to facilitate the change. The natural key remains the same. A new surrogate key is created. An effective timestamp value is updated also, at times implementation can use a boolean flag value to indicate status of the row in study. The example which follows will help you to understand the Type 2 more clearly.
Specific scenario:
Earlier:
Natural Key | Person ID | FName | LName | State | Surrogate Key | From Date | To Date | Current |
S123456 | 1234 | Raunak | Jhawar | MH | 23682164 | 1-1-1990 | – | Y |
Now:
Natural Key | Person ID | FName | LName | State | Surrogate Key | From Date | To Date | Current |
S123456 | 1234 | Raunak | Jhawar | UP | 27384687 | 24-1-1990 | 23-2-1990 | Y |
S123456 | 1234 | Raunak | Jhawar | MH | 23682164 | 1-1-1990 | 23-1-1990 | N |
Pros:
- A thorough history of all changes made to the dimension entity is maintained in the dimension table with appropriate effective date and timestamps.
Cons:
- The dimension table can grow very large in size, for every change encountered; a new data row is created. Though this can be gracefully handled using the SCD Type 4 – History table method which will be shortly introduced to you in the following series.
Type 3
So far, you have learnt about the major SCD types and methods you can handle the upserts on them. Onwards, the topics are fairly advanced and require very specific circumstances to implement them. In most of the real life EDW implementations, you will rarely go beyond Type 2. So, I must congratulate you for learning a very important aspect of any DW implementation.
In Type 3 transformations, the good part is that you can track history change and save yourself by not creating all the additional data rows on your dimension table(s).
Now about the part, which I would call ‘not so good’ is that you will never be able to track all the changes made to the dimension entity. So let us understand more about the very unique Type 3 SCD technique.
Specific scenario:
Earlier:
Natural Key | Person ID | FName | LName | Original State | Surrogate Key | Current State | Effective Date |
S123456 | 1234 | Raunak | Jhawar | MH | 23682164 | MH | 1-1-1900 |
Now:
Natural Key | Person ID | FName | LName | Original State | Surrogate Key | Current State | Effective Date |
S123456 | 1234 | Raunak | Jhawar | MH | 23682164 | UP | 23-1-1900 |
Spot the change. The change is that no new data row is created; instead we have 2 additional columns here to rack the change. The ‘current state’ and ‘effective date’ column value will suffice the user to track the change.
Expert Comment: Type 3 is rarely encountered. You may implement when you are sure that the column will not change frequently.
Tune in for the remaining SCD Types in the follow up chapter.
Regards
Raunak Jhawar
Like us on FaceBook | Follow us on Twitter | Join the fastest growing SQL Server group on FaceBook
Follow me on Twitter | Follow me on FaceBook
Your SCD type 0 is wrong
Rest article is in sense
Right. Nice catch! Type mistake, though