Basic Informatica Tutorial

Saturday, June 29, 2019

SCD Type 2 Flag

SCD Type 2 Flag is the method used to store historical data is maintained along with current data.

- Create a mapping and name it.
- Drag Emp source into the mapping area.
- Create a Lookup transformation on target table.
- On Primary key columns you should define your condition, because its the matching column between source and target.

- Create an Expression, drag the ports from lookup transformation and name as prev record. This tells us that records are old ones which are used to compare with new ones.
- Drag the comparision keys from source qualifier to Expression
- Create two ports as Newflag and Changedflag and give syntax as Isnull (Cust_key) and Not Isnull(Cust_key) and (Prev_Empn != Empno OR Prev_Ename != Ename OR Prev_Sal != Sal)

- First syntax will check for nulls. If there are null, lookup will insert data into the target table.
- Second syntax will check for existing rows. If there are rows, lookup will update data into the target table.
- Create Filter1 and drag all the ports from source qualifier and give filter condition as NEWFLAG
- Connect update strategy and give the 'Update Strategy Expression' as DD_INSERT

- Create Filter2 and drag Cust key and comparision keys from source and give filter condition as CHANGEFLAG
- Connect update strategy and give the 'Update Strategy Expression' as DD_INSERT
- Create Sequence gen, connect it to Expression and connect the NEXTVAL port to CUST_KEY ports in both Target instances 1 & 2
- Also create FLAG port, assign value 1 to it and connect port to both Target instances 1 & 2
- Create another Expression, drag CUST_KEY to it from Filter2 and create FLAG port, Give the value as 0.
- Create a new update strategy and give the 'Update Strategy Expression' as DD_UPDATE
- Make three instances of target, Connect update strategy 1 to target 1 and update strategy 2 to target 2, Connect update strategy 3 to target 3
- Here First pipeline inserts new data into the target, Second pipeline inserts changed data into the target where as Third pipeline updates changed data into the target. Connect only that ports which you want to insert and update in the target table
- Final mapping looks like below screen shot

Note 1 : Lookup condition should be on logical key column
Note 2 : lookup transformation should contain Logical key, Primary key & Comparison key columns.
Note 3 : Second pipeline in the mapping contains comparison key, primary key & changed flag columns.

Note 4 : Expression transformation should contain only primary key, comparision keys along with related source keys and new and changed flag.

SCD Type 2 Effective Date

SCD Type 2 Effective Date is the method used to store historical data is maintained along with current data.

- Create a mapping and name it.
- Drag Emp source into the mapping area.
- Create a Lookup transformation on target table.
- On Primary key columns you should define your condition, because its the matching column between source and target.

- Create Filter2 and drag Cust key and comparision keys from source and give filter condition as CHANGEFLAG
- Connect update strategy and give the 'Update Strategy Expression' as DD_INSERT
- Create another Expression, drag CUST_KEY to it from Filter2 and create END_DATE port. Give the syntax as SYSDATE.
- Create a new update strategy and give the 'Update Strategy Expression' as DD_UPDATE
- Make three instances of target, Connect update strategy 1 to target 1 and update strategy 2 to target 2, Connect update strategy 3 to target 3
- Here First pipeline inserts new data into the target, Second pipeline inserts changed data into the target where as Third pipeline updates changed data into the target. Connect only that ports which you want to insert and update in the target table
- Create Seq Gen and connect to Expression with BEGIN_DATE and assign value as SYSDATE. Connect it to Target instance 1
- Final mapping looks like below screen shot

Note 1 : Lookup condition should be on logical key column
Note 2 : lookup transformation should contain Logical key, Primary key & Comparison key columns.
Note 3 : Second pipeline in the mapping contains comparison key, primary key & changed flag columns.
Note 4 : Expression transformation should contain only primary key, comparision keys along with related source keys and new and changed flag.

Monday, June 24, 2019

Normalizer Transformation

Normalizer Transformation is Active and Connected transformation. It converts single row data into multiple columns data. It converts de-normalized table into a normalized table. You cannot drag & drop columns to normalizer transformation like the rest of the transformations.

* Normalizer is used to convert rows into columns.
* Normalizer is used in the place of source qualifier while reading mainframe or Cobal Source.

Steps to create Normalizer transformation :-
-----------------------------------------------------------------
- Go to tranformations tab and select normalizer
- Double click normalizer and select the normalizer tab
- Add the occurs based on the requirement

In Normalizer properties tab 'Reset' and 'Restart' are the two options available.

Reset is used to reset the Gk value to the value, that is used before the session.
Restart is used to start Gk sequence from 1 and restart for each session

Sunday, June 23, 2019

Types Of Keys In Database

Primary Key :-
---------------
A primary key is a single field or combination of fields that uniquely identifies a row in the table.

The following are rules that make a column a primary key:
- A primary key column cannot contain a NULL value
- A primary key value must be unique within the entire table
- A primary key value should not be changed over time

Foreign Key :-
---------------
A foreign key means that values in one table must also appear in another table.
The referenced table is called the parent table while the table with the foreign key is called the child table.
The foreign key in the child table will generally reference a primary key in the parent table.

Natural Key and Surrogate Key:-
------------------------------------
Sometimes the primary key is made up of real data and these are normally referred to as natural keys, while other times the key is generated when a new record is inserted into a table. When a primary key is generated at runtime, it is called a surrogate key.

A natural key is a single column or set of columns that uniquely identifies a single record in a table, where the key columns are made up of real data. When I say “real data” I mean data that has meaning and occurs naturally in the world of data. A natural key is a column value that has a relationship with the rest of the column values in a given data record.

keys that don’t have a natural relationship with the rest of the columns in a table. The surrogate key is just a value that is generated and then stored with the rest of the columns in a record. The key value is typically generated at run time right before the record is inserted into a table.

Sunday, May 26, 2019

Mapplet & Worklet

Mapplet :-

----------

If you want to create same logic in multiple mapping we use Mapplets. Instead of creating logic every time, we can create it in a single Mapplet and use it in many mappings.

- Go to Mapplet designer, create Mapplet input and output from transformations

- Apply the mapping logic between input and output Mapplet.

- Now apply this logic in mapping where ever there is requirement.

- If you want to show mapplet logic in the mapping just go to mapping click expand/unexpand option. Select correct one from the list of mapplets & click ok.

Worklet :-
----------
If you want to create dependencies between the workflows or sessions, we use worklets.
If you have many sessions in a mapping its very difficult to manage with dependencies, instead of adding dependencies among various session we use Worklets to make our work easy by creating dependencies among different Worklets.

- Create a worklet, name it and add connect sessions to it.

- Drag worklet from left and connect all of them together in the workflow designer.

Reusable Transformation :-

--------------------------------

If we want to use any transformation again & again in the mapping we can create reusable transformation.

Double click transformation & check option reusable to make it reusable transformation.

We cannot edit it once its converted into reusable. We need to go to Transformation developer to change anything if required. If any transformation is created in Transformation developer it will be by default a reusable transformation

Partitioning in Informatica

It's used for parallel processing in order to decrease the time to load data into target. Different partitions are pass through, round robin, hash user key, auto user key, key range and finally database partition.
Each type of partition works according to its own logic.

- Partitioning option is found in session, click on mapping and you can find an option in bottom left, beside transformation option as 'Partitions', Click on it.

- Look of mapping after we develop a mapping code

Pass through : Its default partition. It distributes the rows sequentially to all the partitions. The Integration Service processes data without redistributing rows
among partitions. All rows in a single partition stay in the partition after crossing a pass-through partition point. Choose pass-through partitioning when we want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions.

Round robin : The Integration Service distributes data evenly among all partitions. Use round-robin partitioning where we want each partition to process approximately the same numbers of rows i.e. load balancing.

Hash Auto-key : It generates a hash key value by which it distributes rows according to its hash key logic to the partitions.The Integration Service uses a hash function to group rows of data among partitions. The Integration Service groups the data based on a partition key.The Integration Service uses all grouped or sorted ports as a compound partition key. We may need to use hash auto-keys partitioning at Rank, Sorter and unsorted Aggregator transformations.

Hash userkey : The Integration Service uses a hash function to group rows of data among partitions. We define the number of ports to generate the partition key.

Key range : We need to give the key range like which partition should process how many rows. Its completely according to our logic. The Integration Service
distributes rows of data based on a port or set of ports that we define as the partition key. For each port, we define a range of values. The Integration Service uses the key and ranges to send rows to the appropriate partition. Use key range partitioning when the sources or targets in the pipeline are partitioned by key range.

Database : It automatically checks the database like how many partitions are available and based on the it distributes rows to the partitions.The Integration
Service queries the database system for table partition information. It reads partitioned data from the corresponding nodes in the database.

Points to consider while using Informatica partitions :-
-----------------------------------------------------------
* We cannot specify partition for Sequence generator.

* We should specify sorter before joiner otherwise the session fails.

* We cannot create a partition key for hash auto-keys, round-robin, or pass-through types partitioning.

* If you have bitmap index defined upon the target and you are using pass-through partitioning to, say Update the target table - the session might fail as bitmap index creates serious locking problem in this scenario.

* Partitioning considerably increases the total DTM buffer memory requirement for the job. Ensure you have enough free memory in order to avoid memory allocation failures.

* When you do pass-through partitioning, Informatica will try to establish multiple connection requests to the database server. Ensure that database is configured to accept high number of connection requests.

* As an alternative to partitioning, you may also use native database options to increase degree of parallelism of query processing. For example in Oracle database you can either specify PARALLEL hint or alter the DOP of the table in subject.

* If required you can even combine Informatica partitioning with native database level parallel options - e.g. you create 5 pass-through pipelines, each sending query to Oracle database with PARALLEL hint.

Default Partitions are SourceQualifier, Aggregator, Target.
SQ : Pass though
Sorter : Hash Auto key
Router :
Lookup : Any partition
Expression : Pass through
Aggregator : Hash Auto key
Sorter : Pass through
Target : Round robin or Pass through
Joiner : 1:n Its has a different logic of partitiong.

Friday, May 24, 2019

Practise Scenarios

1) How to convert single row from source to three rows in target

- In normalizer create column1 and increase occurs to 3 so that it create 3 inputs

2) How to Split the non-key columns to separate tables with key column in both

3) How to separate duplicate and non duplicate records to a separate table.

- In sorter transformation sort on first port in ascending order
- In aggregator transformation create a new port and check group by port on column1, then take count on same column as count(address1).
- In router create two groups and give conditions as count = 1 and count >1.

4) How to Retrieving first and last record from a table/file.

- Give rank as top and bottom for both the rank transfomations and set number of ranks as 1.
- Create two instances of target.

5) How to remove footer from your file ???????????

- In filter give the condition as instr(col1,'footer')=0, which removes the word footer from the footer file.

6) How to send 1st half rows to target1 and second half rows to target 2 ?

- Generate count in aggregator and group by on empno
- Generate seqno in expression
- Use joiner to join both aggregator and expression
- In router give two conditions as ' o_seqno <= count_rec/2' and 'o_seqno >= count_rec/2

7) How to send alternate records to 2 different targets ?

- Generate sequence and connect to expression
- In expression transformation, add two ports as Even and Odd.
Give the syntax for Odd port as ' NEXTVAL%2 = 1' & Even port as 'NEXTVAL % 2 = 0'.
- In router transformation, create two groups Even and Odd. Give the filter condition for Even as ' Even = 1' and for Odd as ' Odd = 1'.

8) How to send record into target in cyclic order ?

- In sequence generator set start value as 1, end value as 3 and check cycle option to repeat 1 to 3 cycle.
- In router create 3 groups and give condition as Nextval = 1, Nextval = 2, Nextval = 3 and connect outputs to target.

9) How to generate target file names dynamically for flat files?

- Go to the target and select top right 'F' option, a new column will be generated as filename.

- In expression create a 'Dynamic file' name port and give its any name like 'Test_file' or ' To_char(sessstarttime,'yyyymmdd24').
- Connect Dynamic file to filename port in the target.

10) How to process 'CurrentlyProcessedFileName' into the target?

After importing source definition, go to properties tab and select the option 'add Currently Processed Flat File Name'. As a result the filename will be processed into the target.

11) How to remove special characters from a string ?

- In expression transformation create a o_name port and give a synatax as 'REG_REPLACE (NAME,'[^A-Z0-9a-z ]','').
- This expression removes all the special characters except Captial letters, Small letters and numbers.

12) How to load the last 3 rows from a flat file into a target?

- Sort on Empno port and connect to Expression transformation as well as Aggregator transformation
- Create dummy ports in both the expression and Aggregator, inorder to join both the transformation with joiner
- Create count port in aggregator transformation
- Check the sorted input option in both the aggregator and joiner transformations
- In joiner give condition as dummy= dummy1
- In router or filter give the condition as Count - o_seqno <=2, as a result last 3 rows will be inserted into the target.

Rank transformation can also be used here to make it simple.

- Select bottom 3 option in rank transformation & connect it to target.

13) How to load the first 3 rows into target1, last 3 rows into a target2 and remaining rows into target3 ?

- This logic is almost same as previous one, but additional conditions and targets are added as per the requirement
- In router first condition should be O_SEQ_NO <= 3, second condition should be COUNT - O_SEQ_NO <= 2
- Now connect all to the targets 1,2 and 3
- Default group will return remaining rows, which are not mentioned in condition

14) Scenario, if you have source with 10 records and 3 target table, your target1 output should be 1-2 records, target2 output should be 2-7 records, target3 output should be 8-10 records.

- Sort on Empno port and connect to Expression transformation as well as Aggregator transformation
- Create dummy ports in both the expression and Aggregator, in order to join both the transformation with joiner
- Create count port in aggregator transformation
- Check the sorted input option in both the aggregator and joiner transformations
- In joiner give condition as dummy= dummy1
- In router, create two different groups and name it as First and Last
- Give the First group condition as Seqno <=2, Last group condition as Count - o_seqno <=2 as a result last 3 rows will be inserted into the target
- Finally connect default port to target 3, remaining records will be loaded into the target