Basic Informatica Tutorial

Sunday, June 30, 2019

Indirect File Method

Unlike Direct method, Indirect file method is used when we have multiple files and load them to single target.

To configure Indirect method :-
----------------------------------
- Create a flat file and paste all the path of the flat files in it.

- Drag the one of the source definitions among the multiple flat files and develop code.
- In the session level properties, select the source types as 'indirect option' instead of direct.
- Give the path in source file directory.
- Give Common file name in source filename field.

Saturday, June 29, 2019

SCD Type 2 Flag

SCD Type 2 Flag is the method used to store historical data is maintained along with current data.

- Create a mapping and name it.
- Drag Emp source into the mapping area.
- Create a Lookup transformation on target table.
- On Primary key columns you should define your condition, because its the matching column between source and target.

- Create an Expression, drag the ports from lookup transformation and name as prev record. This tells us that records are old ones which are used to compare with new ones.
- Drag the comparision keys from source qualifier to Expression
- Create two ports as Newflag and Changedflag and give syntax as Isnull (Cust_key) and Not Isnull(Cust_key) and (Prev_Empn != Empno OR Prev_Ename != Ename OR Prev_Sal != Sal)

- First syntax will check for nulls. If there are null, lookup will insert data into the target table.
- Second syntax will check for existing rows. If there are rows, lookup will update data into the target table.
- Create Filter1 and drag all the ports from source qualifier and give filter condition as NEWFLAG
- Connect update strategy and give the 'Update Strategy Expression' as DD_INSERT

- Create Filter2 and drag Cust key and comparision keys from source and give filter condition as CHANGEFLAG
- Connect update strategy and give the 'Update Strategy Expression' as DD_INSERT
- Create Sequence gen, connect it to Expression and connect the NEXTVAL port to CUST_KEY ports in both Target instances 1 & 2
- Also create FLAG port, assign value 1 to it and connect port to both Target instances 1 & 2
- Create another Expression, drag CUST_KEY to it from Filter2 and create FLAG port, Give the value as 0.
- Create a new update strategy and give the 'Update Strategy Expression' as DD_UPDATE
- Make three instances of target, Connect update strategy 1 to target 1 and update strategy 2 to target 2, Connect update strategy 3 to target 3
- Here First pipeline inserts new data into the target, Second pipeline inserts changed data into the target where as Third pipeline updates changed data into the target. Connect only that ports which you want to insert and update in the target table
- Final mapping looks like below screen shot

Note 1 : Lookup condition should be on logical key column
Note 2 : lookup transformation should contain Logical key, Primary key & Comparison key columns.
Note 3 : Second pipeline in the mapping contains comparison key, primary key & changed flag columns.

Note 4 : Expression transformation should contain only primary key, comparision keys along with related source keys and new and changed flag.

SCD Type 2 Effective Date

SCD Type 2 Effective Date is the method used to store historical data is maintained along with current data.

- Create a mapping and name it.
- Drag Emp source into the mapping area.
- Create a Lookup transformation on target table.
- On Primary key columns you should define your condition, because its the matching column between source and target.

- Create Filter2 and drag Cust key and comparision keys from source and give filter condition as CHANGEFLAG
- Connect update strategy and give the 'Update Strategy Expression' as DD_INSERT
- Create another Expression, drag CUST_KEY to it from Filter2 and create END_DATE port. Give the syntax as SYSDATE.
- Create a new update strategy and give the 'Update Strategy Expression' as DD_UPDATE
- Make three instances of target, Connect update strategy 1 to target 1 and update strategy 2 to target 2, Connect update strategy 3 to target 3
- Here First pipeline inserts new data into the target, Second pipeline inserts changed data into the target where as Third pipeline updates changed data into the target. Connect only that ports which you want to insert and update in the target table
- Create Seq Gen and connect to Expression with BEGIN_DATE and assign value as SYSDATE. Connect it to Target instance 1
- Final mapping looks like below screen shot

Note 1 : Lookup condition should be on logical key column
Note 2 : lookup transformation should contain Logical key, Primary key & Comparison key columns.
Note 3 : Second pipeline in the mapping contains comparison key, primary key & changed flag columns.
Note 4 : Expression transformation should contain only primary key, comparision keys along with related source keys and new and changed flag.

Monday, June 24, 2019

Normalizer Transformation

Normalizer Transformation is Active and Connected transformation. It converts single row data into multiple columns data. It converts de-normalized table into a normalized table. You cannot drag & drop columns to normalizer transformation like the rest of the transformations.

* Normalizer is used to convert rows into columns.
* Normalizer is used in the place of source qualifier while reading mainframe or Cobal Source.

Steps to create Normalizer transformation :-
-----------------------------------------------------------------
- Go to tranformations tab and select normalizer
- Double click normalizer and select the normalizer tab
- Add the occurs based on the requirement

In Normalizer properties tab 'Reset' and 'Restart' are the two options available.

Reset is used to reset the Gk value to the value, that is used before the session.
Restart is used to start Gk sequence from 1 and restart for each session

There 2 important ports available and they are GK (Generated Key) and GCID (Generated Column ID)

GK generates sequence number starting from the value defined in the sequence field.

GCID hold the value of the occurrence field.

Normalizer generates seperate rows based on the occurances we put in the transformation.

GK is used to identify whether the records belongs to the same original record and allocates key or values like 1 or 2.

GCID is used to give column id's or values to the generated columns or occurances like 1,2,3.

Sunday, June 23, 2019

Types Of Keys In Database

Primary Key :-
---------------
A primary key is a single field or combination of fields that uniquely identifies a row in the table.

The following are rules that make a column a primary key:
- A primary key column cannot contain a NULL value
- A primary key value must be unique within the entire table
- A primary key value should not be changed over time

Foreign Key :-
---------------
A foreign key means that values in one table must also appear in another table.
The referenced table is called the parent table while the table with the foreign key is called the child table.
The foreign key in the child table will generally reference a primary key in the parent table.

Natural Key and Surrogate Key:-
------------------------------------
Sometimes the primary key is made up of real data and these are normally referred to as natural keys, while other times the key is generated when a new record is inserted into a table. When a primary key is generated at runtime, it is called a surrogate key.

A natural key is a single column or set of columns that uniquely identifies a single record in a table, where the key columns are made up of real data. When I say “real data” I mean data that has meaning and occurs naturally in the world of data. A natural key is a column value that has a relationship with the rest of the column values in a given data record.

keys that don’t have a natural relationship with the rest of the columns in a table. The surrogate key is just a value that is generated and then stored with the rest of the columns in a record. The key value is typically generated at run time right before the record is inserted into a table.

Sunday, May 26, 2019

Mapplet & Worklet

Mapplet :-

----------

If you want to create same logic in multiple mapping we use Mapplets. Instead of creating logic every time, we can create it in a single Mapplet and use it in many mappings.

- Go to Mapplet designer, create Mapplet input and output from transformations

- Apply the mapping logic between input and output Mapplet.

- Now apply this logic in mapping where ever there is requirement.

- If you want to show mapplet logic in the mapping just go to mapping click expand/unexpand option. Select correct one from the list of mapplets & click ok.

Worklet :-
----------
If you want to create dependencies between the workflows or sessions, we use worklets.
If you have many sessions in a mapping its very difficult to manage with dependencies, instead of adding dependencies among various session we use Worklets to make our work easy by creating dependencies among different Worklets.

- Create a worklet, name it and add connect sessions to it.

- Drag worklet from left and connect all of them together in the workflow designer.

Reusable Transformation :-

--------------------------------

If we want to use any transformation again & again in the mapping we can create reusable transformation.

Double click transformation & check option reusable to make it reusable transformation.

We cannot edit it once its converted into reusable. We need to go to Transformation developer to change anything if required. If any transformation is created in Transformation developer it will be by default a reusable transformation

Partitioning in Informatica

It's used for parallel processing in order to decrease the time to load data into target. Different partitions are pass through, round robin, hash user key, auto user key, key range and finally database partition.
Each type of partition works according to its own logic.

- Partitioning option is found in session, click on mapping and you can find an option in bottom left, beside transformation option as 'Partitions', Click on it.

- Look of mapping after we develop a mapping code

Pass through : Its default partition. It distributes the rows sequentially to all the partitions. The Integration Service processes data without redistributing rows
among partitions. All rows in a single partition stay in the partition after crossing a pass-through partition point. Choose pass-through partitioning when we want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions.

Round robin : The Integration Service distributes data evenly among all partitions. Use round-robin partitioning where we want each partition to process approximately the same numbers of rows i.e. load balancing.

Hash Auto-key : It generates a hash key value by which it distributes rows according to its hash key logic to the partitions.The Integration Service uses a hash function to group rows of data among partitions. The Integration Service groups the data based on a partition key.The Integration Service uses all grouped or sorted ports as a compound partition key. We may need to use hash auto-keys partitioning at Rank, Sorter and unsorted Aggregator transformations.

Hash userkey : The Integration Service uses a hash function to group rows of data among partitions. We define the number of ports to generate the partition key.

Key range : We need to give the key range like which partition should process how many rows. Its completely according to our logic. The Integration Service
distributes rows of data based on a port or set of ports that we define as the partition key. For each port, we define a range of values. The Integration Service uses the key and ranges to send rows to the appropriate partition. Use key range partitioning when the sources or targets in the pipeline are partitioned by key range.

Database : It automatically checks the database like how many partitions are available and based on the it distributes rows to the partitions.The Integration
Service queries the database system for table partition information. It reads partitioned data from the corresponding nodes in the database.

Points to consider while using Informatica partitions :-
-----------------------------------------------------------
* We cannot specify partition for Sequence generator.

* We should specify sorter before joiner otherwise the session fails.

* We cannot create a partition key for hash auto-keys, round-robin, or pass-through types partitioning.

* If you have bitmap index defined upon the target and you are using pass-through partitioning to, say Update the target table - the session might fail as bitmap index creates serious locking problem in this scenario.

* Partitioning considerably increases the total DTM buffer memory requirement for the job. Ensure you have enough free memory in order to avoid memory allocation failures.

* When you do pass-through partitioning, Informatica will try to establish multiple connection requests to the database server. Ensure that database is configured to accept high number of connection requests.

* As an alternative to partitioning, you may also use native database options to increase degree of parallelism of query processing. For example in Oracle database you can either specify PARALLEL hint or alter the DOP of the table in subject.

* If required you can even combine Informatica partitioning with native database level parallel options - e.g. you create 5 pass-through pipelines, each sending query to Oracle database with PARALLEL hint.

Default Partitions are SourceQualifier, Aggregator, Target.
SQ : Pass though
Sorter : Hash Auto key
Router :
Lookup : Any partition
Expression : Pass through
Aggregator : Hash Auto key
Sorter : Pass through
Target : Round robin or Pass through
Joiner : 1:n Its has a different logic of partitiong.