Showing posts with label Slowly Changing Dimensions. Show all posts
Showing posts with label Slowly Changing Dimensions. Show all posts

Saturday, June 29, 2019

SCD Type 2 Flag

SCD Type 2 Flag is the method used to store historical data is maintained along with current data.


- Create a mapping and name it.
- Drag Emp source into the mapping area.
- Create a Lookup transformation on target table.
- On Primary key columns you should define your condition, because its the matching column between source and target.


- Create an Expression, drag the ports from lookup transformation and name as prev record. This tells us that records are old ones which are used to compare with new ones.
- Drag the comparision keys from source qualifier to Expression
- Create two ports as Newflag and Changedflag and give syntax as Isnull (Cust_key) and Not Isnull(Cust_key) and (Prev_Empn != Empno OR Prev_Ename != Ename OR Prev_Sal != Sal)



- First syntax will check for nulls. If there are null, lookup will insert data into the target table.
- Second syntax will  check for existing rows. If there are rows, lookup will update data into the target table.
- Create Filter1 and drag all the ports from source qualifier and give filter condition as NEWFLAG
- Connect update strategy and give the 'Update Strategy Expression' as DD_INSERT



- Create Filter2 and drag Cust key and comparision keys from source and give filter condition as CHANGEFLAG
- Connect update strategy and give the 'Update Strategy Expression' as DD_INSERT
- Create Sequence gen, connect it to Expression and connect the NEXTVAL port to CUST_KEY ports in both Target instances 1 & 2 
- Also create FLAG port, assign value 1 to it and connect port to both Target instances 1 & 2
- Create another Expression, drag CUST_KEY to it from Filter2 and create FLAG port, Give the value as 0.
- Create a new update strategy and give the 'Update Strategy Expression' as DD_UPDATE
- Make three instances of target, Connect update strategy 1 to target 1 and update strategy 2 to target 2, Connect update strategy 3 to target 3
- Here First pipeline inserts new data into the target, Second pipeline inserts changed data into the target where as Third pipeline updates changed data into the target. Connect only that ports which you want to insert and update in the target table
- Final mapping looks like below screen shot




Note 1 : Lookup condition should be on logical key column
Note 2 : lookup transformation should contain Logical key, Primary key & Comparison key columns.
Note 3 : Second pipeline in the mapping contains comparison key, primary key & changed flag columns.

Note 4 : Expression transformation should contain only primary key, comparision keys along with related source keys and new and changed flag.

SCD Type 2 Effective Date

SCD Type 2 Effective Date is the method used to store historical data is maintained along with current data.


- Create a mapping and name it.
- Drag Emp source into the mapping area.
- Create a Lookup transformation on target table.
- On Primary key columns you should define your condition, because its the matching column between source and target.



- Create an Expression, drag the ports from lookup transformation and name as prev record. This tells us that records are old ones which are used to compare with new ones.
- Drag the comparision keys from source qualifier to Expression
- Create two ports as Newflag and Changedflag and give syntax as Isnull (Cust_key) and Not Isnull(Cust_key) and (Prev_Empn != Empno OR Prev_Ename != Ename OR Prev_Sal != Sal)



- First syntax will check for nulls. If there are null, lookup will insert data into the target table.
- Second syntax will  check for existing rows. If there are rows, lookup will update data into the target table.
- Create Filter1 and drag all the ports from source qualifier and give filter condition as NEWFLAG
- Connect update strategy and give the 'Update Strategy Expression' as DD_INSERT



- Create Filter2 and drag Cust key and comparision keys from source and give filter condition as CHANGEFLAG
- Connect update strategy and give the 'Update Strategy Expression' as DD_INSERT
- Create another Expression, drag CUST_KEY to it from Filter2 and create END_DATE port. Give the syntax as SYSDATE.
- Create a new update strategy and give the 'Update Strategy Expression' as DD_UPDATE
- Make three instances of target, Connect update strategy 1 to target 1 and update strategy 2 to target 2, Connect update strategy 3 to target 3
- Here First pipeline inserts new data into the target, Second pipeline inserts changed data into the target where as Third pipeline updates changed data into the target. Connect only that ports which you want to insert and update in the target table
- Create Seq Gen and connect to Expression with BEGIN_DATE and assign value as SYSDATE. Connect it to Target instance 1
- Final mapping looks like below screen shot




Note 1 : Lookup condition should be on logical key column
Note 2 : lookup transformation should contain Logical key, Primary key & Comparison key columns.
Note 3 : Second pipeline in the mapping contains comparison key, primary key & changed flag columns.
Note 4 : Expression transformation should contain only primary key, comparision keys along with related source keys and new and changed flag.
















Sunday, April 14, 2019

SCD Type 2 Version

In this SCD Type 2 historical data is maintained along with current data. SCD Type 2 concept can be implemented with 3 different methods. Different methods are Version, Flag and Effective Date.
First we will focus on Version method.


 Create a mapping and name it.
- Drag Emp source into the mapping area.
- Create a Lookup transformation on target table.
- On Primary key columns you should define your condition, because its the matching column between source and target.



- Create an Expression, drag the ports from lookup transformation and name as prev record. This tells us that records are old ones which are used to compare with new ones.
- Drag the comparision keys from source qualifier to Expression
- Create two ports as Newflag and Changedflag and give syntax as Isnull (Cust_key) and Not Isnull(Cust_key) and (Prev_Empn != Empno OR Prev_Ename != Ename OR Prev_Sal != Sal)




- First syntax will check for nulls. If there are null, lookup will insert data into the target table.
- Second syntax will  check for existing rows. If there are rows, lookup will update data into the target table.
- Create Filter1 and drag all the ports from source qualifier and give filter condition as NEWFLAG
- Connect update strategy and give the 'Update Strategy Expression' as DD_INSERT


  
- Create Filter2 and drag Cust key and Version ports from expression and comparision keys from source and give filter condition as CHANGEFLAG
- Create another Expression, connect all ports from filter2 and create output assign +1 value to Version port as below. 



- Connect to update strategy transformation.
- Connect update strategy and give the 'Update Strategy Expression' as DD_UPDATE



- Make two instances of target, Connect update strategy 1 to target 1 and update strategy 2 to target 2. 
- Create Seq Gen and connect to Expression with version port and assign value as 1. Connect it to Target
- Final mapping looks like below screen shot



Note 1 : Lookup condition should be on logical key column
Note 2 : lookup transformation should contain Logical key, Primary key & Comparison key columns.
Note 3 : Second pipeline in the mapping contains comparison key, primary key & changed flag columns.

Note 4 : Expression transformation should contain only primary key, comparision keys along with related source keys and new and changed flag.






Saturday, April 13, 2019

SCD Type 1

Previously we use to check the option truncate table but when ever we use to load data into target our previous data used to be deleted and current data used to be inserted.

Now we deal with all the historical and current data. In this SCD type 1 historical data is overwritten with current data. So as a result we do not maintain history but we maintain only current data.

In this SCD type 1 all the historical data is lost as a result.

- Create a mapping and name it.
- Drag Emp source into the mapping area.
- Create a Lookup transformation on target table.
- On Primary key columns you should define your condition, because its the matching column between source and target.


- Create an Expression, drag the ports from lookup transformation and name as prev record. This tells us that records are old ones which are used to compare with new ones.
- Drag the comparision keys from source qualifier to Expression.
- Create two ports as Newflag and Changedflag and give syntax as Isnull (Cust_key) and Not Isnull(Cust_key) and (Prev_Empn != Empno OR Prev_Ename != Ename OR Prev_Sal != Sal)




- First syntax will check for nulls. If there are null, lookup will insert data into the target table.
- Second syntax will  check for existing rows. If there are rows, lookup will update data into the target table.
- Create Filter1 and drag all the ports from source qualifier and give filter condition as NEWFLAG
- Connect update strategy and give the 'Update Strategy Expression' as DD_INSERT


 - Create Filter2 and drag Cust key from expression and comparision keys from source and give filter condition as CHANGEFLAG
- Connect update strategy and give the 'Update Strategy Expression' as DD_UPDATE



- Make two instances of target, Connect update strategy 1 to target 1 and update strategy 2 to target 2. 
- Connect Seq Gen to the cust_key column of target
- Final mapping looks like below screen shot



Note 1 : Lookup condition should be on logical key column
Note 2 : lookup transformation should contain Logical key, Primary key & Comparison key columns.
Note 3 : Second pipeline in the mapping contains comparison key, primary key & changed flag columns.

Note 4 : Expression transformation should contain only primary key, comparision keys along with related source keys and new and changed flag.

Friday, April 12, 2019

Introduction to SCD's

Lets consider the example of a company which was started many years ago. At the starting company will store all the information about the employees, products, location and so on. Later on when employees may resign, change their location, company may launch new products, their may be change in their salary over a period of time.

So their will be problem of data mismatch. In order to overcome the problem a new technique is introduced called as "Slowly Changing Dimensions".
Here the dimension tables will be updated based on their priority.

These Slowly Changing Dimensions are categorized into 3 different types based on their functions. They are 
- SCD 1
- SCD 2
- SCD 3

SUBQUERIES PRACTISE QUESTIONS

1. Write a SQL query to find those employees who receive a higher salary than the employee with ID 7369. SELECT * FROM EMP WHERE SAL >  (...