Basic Informatica Tutorial

Monday, June 24, 2019

Normalizer Transformation

Normalizer Transformation is Active and Connected transformation. It converts single row data into multiple columns data. It converts de-normalized table into a normalized table. You cannot drag & drop columns to normalizer transformation like the rest of the transformations.

* Normalizer is used to convert rows into columns.
* Normalizer is used in the place of source qualifier while reading mainframe or Cobal Source.

Steps to create Normalizer transformation :-
-----------------------------------------------------------------
- Go to tranformations tab and select normalizer
- Double click normalizer and select the normalizer tab
- Add the occurs based on the requirement

In Normalizer properties tab 'Reset' and 'Restart' are the two options available.

Reset is used to reset the Gk value to the value, that is used before the session.
Restart is used to start Gk sequence from 1 and restart for each session

There 2 important ports available and they are GK (Generated Key) and GCID (Generated Column ID)

GK generates sequence number starting from the value defined in the sequence field.

GCID hold the value of the occurrence field.

Normalizer generates seperate rows based on the occurances we put in the transformation.

GK is used to identify whether the records belongs to the same original record and allocates key or values like 1 or 2.

GCID is used to give column id's or values to the generated columns or occurances like 1,2,3.

Sunday, June 23, 2019

Types Of Keys In Database

Primary Key :-
---------------
A primary key is a single field or combination of fields that uniquely identifies a row in the table.

The following are rules that make a column a primary key:
- A primary key column cannot contain a NULL value
- A primary key value must be unique within the entire table
- A primary key value should not be changed over time

Foreign Key :-
---------------
A foreign key means that values in one table must also appear in another table.
The referenced table is called the parent table while the table with the foreign key is called the child table.
The foreign key in the child table will generally reference a primary key in the parent table.

Natural Key and Surrogate Key:-
------------------------------------
Sometimes the primary key is made up of real data and these are normally referred to as natural keys, while other times the key is generated when a new record is inserted into a table. When a primary key is generated at runtime, it is called a surrogate key.

A natural key is a single column or set of columns that uniquely identifies a single record in a table, where the key columns are made up of real data. When I say “real data” I mean data that has meaning and occurs naturally in the world of data. A natural key is a column value that has a relationship with the rest of the column values in a given data record.

keys that don’t have a natural relationship with the rest of the columns in a table. The surrogate key is just a value that is generated and then stored with the rest of the columns in a record. The key value is typically generated at run time right before the record is inserted into a table.

Sunday, May 26, 2019

Mapplet & Worklet

Mapplet :-

----------

If you want to create same logic in multiple mapping we use Mapplets. Instead of creating logic every time, we can create it in a single Mapplet and use it in many mappings.

- Go to Mapplet designer, create Mapplet input and output from transformations

- Apply the mapping logic between input and output Mapplet.

- Now apply this logic in mapping where ever there is requirement.

- If you want to show mapplet logic in the mapping just go to mapping click expand/unexpand option. Select correct one from the list of mapplets & click ok.

Worklet :-
----------
If you want to create dependencies between the workflows or sessions, we use worklets.
If you have many sessions in a mapping its very difficult to manage with dependencies, instead of adding dependencies among various session we use Worklets to make our work easy by creating dependencies among different Worklets.

- Create a worklet, name it and add connect sessions to it.

- Drag worklet from left and connect all of them together in the workflow designer.

Reusable Transformation :-

--------------------------------

If we want to use any transformation again & again in the mapping we can create reusable transformation.

Double click transformation & check option reusable to make it reusable transformation.

We cannot edit it once its converted into reusable. We need to go to Transformation developer to change anything if required. If any transformation is created in Transformation developer it will be by default a reusable transformation

Partitioning in Informatica

It's used for parallel processing in order to decrease the time to load data into target. Different partitions are pass through, round robin, hash user key, auto user key, key range and finally database partition.
Each type of partition works according to its own logic.

- Partitioning option is found in session, click on mapping and you can find an option in bottom left, beside transformation option as 'Partitions', Click on it.

- Look of mapping after we develop a mapping code

Pass through : Its default partition. It distributes the rows sequentially to all the partitions. The Integration Service processes data without redistributing rows
among partitions. All rows in a single partition stay in the partition after crossing a pass-through partition point. Choose pass-through partitioning when we want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions.

Round robin : The Integration Service distributes data evenly among all partitions. Use round-robin partitioning where we want each partition to process approximately the same numbers of rows i.e. load balancing.

Hash Auto-key : It generates a hash key value by which it distributes rows according to its hash key logic to the partitions.The Integration Service uses a hash function to group rows of data among partitions. The Integration Service groups the data based on a partition key.The Integration Service uses all grouped or sorted ports as a compound partition key. We may need to use hash auto-keys partitioning at Rank, Sorter and unsorted Aggregator transformations.

Hash userkey : The Integration Service uses a hash function to group rows of data among partitions. We define the number of ports to generate the partition key.

Key range : We need to give the key range like which partition should process how many rows. Its completely according to our logic. The Integration Service
distributes rows of data based on a port or set of ports that we define as the partition key. For each port, we define a range of values. The Integration Service uses the key and ranges to send rows to the appropriate partition. Use key range partitioning when the sources or targets in the pipeline are partitioned by key range.

Database : It automatically checks the database like how many partitions are available and based on the it distributes rows to the partitions.The Integration
Service queries the database system for table partition information. It reads partitioned data from the corresponding nodes in the database.

Points to consider while using Informatica partitions :-
-----------------------------------------------------------
* We cannot specify partition for Sequence generator.

* We should specify sorter before joiner otherwise the session fails.

* We cannot create a partition key for hash auto-keys, round-robin, or pass-through types partitioning.

* If you have bitmap index defined upon the target and you are using pass-through partitioning to, say Update the target table - the session might fail as bitmap index creates serious locking problem in this scenario.

* Partitioning considerably increases the total DTM buffer memory requirement for the job. Ensure you have enough free memory in order to avoid memory allocation failures.

* When you do pass-through partitioning, Informatica will try to establish multiple connection requests to the database server. Ensure that database is configured to accept high number of connection requests.

* As an alternative to partitioning, you may also use native database options to increase degree of parallelism of query processing. For example in Oracle database you can either specify PARALLEL hint or alter the DOP of the table in subject.

* If required you can even combine Informatica partitioning with native database level parallel options - e.g. you create 5 pass-through pipelines, each sending query to Oracle database with PARALLEL hint.

Default Partitions are SourceQualifier, Aggregator, Target.
SQ : Pass though
Sorter : Hash Auto key
Router :
Lookup : Any partition
Expression : Pass through
Aggregator : Hash Auto key
Sorter : Pass through
Target : Round robin or Pass through
Joiner : 1:n Its has a different logic of partitiong.

Friday, May 24, 2019

Practise Scenarios

1) How to convert single row from source to three rows in target

- In normalizer create column1 and increase occurs to 3 so that it create 3 inputs

2) How to Split the non-key columns to separate tables with key column in both

3) How to separate duplicate and non duplicate records to a separate table.

- In sorter transformation sort on first port in ascending order
- In aggregator transformation create a new port and check group by port on column1, then take count on same column as count(address1).
- In router create two groups and give conditions as count = 1 and count >1.

4) How to Retrieving first and last record from a table/file.

- Give rank as top and bottom for both the rank transfomations and set number of ranks as 1.
- Create two instances of target.

5) How to remove footer from your file ???????????

- In filter give the condition as instr(col1,'footer')=0, which removes the word footer from the footer file.

6) How to send 1st half rows to target1 and second half rows to target 2 ?

- Generate count in aggregator and group by on empno
- Generate seqno in expression
- Use joiner to join both aggregator and expression
- In router give two conditions as ' o_seqno <= count_rec/2' and 'o_seqno >= count_rec/2

7) How to send alternate records to 2 different targets ?

- Generate sequence and connect to expression
- In expression transformation, add two ports as Even and Odd.
Give the syntax for Odd port as ' NEXTVAL%2 = 1' & Even port as 'NEXTVAL % 2 = 0'.
- In router transformation, create two groups Even and Odd. Give the filter condition for Even as ' Even = 1' and for Odd as ' Odd = 1'.

8) How to send record into target in cyclic order ?

- In sequence generator set start value as 1, end value as 3 and check cycle option to repeat 1 to 3 cycle.
- In router create 3 groups and give condition as Nextval = 1, Nextval = 2, Nextval = 3 and connect outputs to target.

9) How to generate target file names dynamically for flat files?

- Go to the target and select top right 'F' option, a new column will be generated as filename.

- In expression create a 'Dynamic file' name port and give its any name like 'Test_file' or ' To_char(sessstarttime,'yyyymmdd24').
- Connect Dynamic file to filename port in the target.

10) How to process 'CurrentlyProcessedFileName' into the target?

After importing source definition, go to properties tab and select the option 'add Currently Processed Flat File Name'. As a result the filename will be processed into the target.

11) How to remove special characters from a string ?

- In expression transformation create a o_name port and give a synatax as
'REG_REPLACE (NAME,'[^A-Z0-9a-z ]','').

- This expression removes all the special characters except Captial letters, Small letters and numbers.

SELECT regexp_replace('Removing_Special - Characters .,_-$%&^& ',

'[^0-9 A-Za-z]', '');

Update emp2 set Ename = Regexp_Replace (Ename,'[^A-Z0-9 ]', '');

12) How to load the last 3 rows from a flat file into a target?

- Sort on Empno port and connect to Expression transformation as well as Aggregator transformation
- Create dummy ports in both the expression and Aggregator, inorder to join both the transformation with joiner
- Create count port in aggregator transformation
- Check the sorted input option in both the aggregator and joiner transformations
- In joiner give condition as dummy= dummy1
- In router or filter give the condition as Count - o_seqno <=2, as a result last 3 rows will be inserted into the target.

Rank transformation can also be used here to make it simple.

- Select bottom 3 option in rank transformation & connect it to target.

13) How to load the first 3 rows into target1, last 3 rows into a target2 and remaining rows into target3 ?

- This logic is almost same as previous one, but additional conditions and targets are added as per the requirement
- In router first condition should be O_SEQ_NO <= 3, second condition should be COUNT - O_SEQ_NO <= 2
- Now connect all to the targets 1,2 and 3
- Default group will return remaining rows, which are not mentioned in condition

14) Scenario, if you have source with 10 records and 3 target table, your target1 output should be 1-2 records, target2 output should be 2-7 records, target3 output should be 8-10 records.

- Sort on Empno port and connect to Expression transformation as well as Aggregator transformation
- Create dummy ports in both the expression and Aggregator, in order to join both the transformation with joiner
- Create count port in aggregator transformation
- Check the sorted input option in both the aggregator and joiner transformations
- In joiner give condition as dummy= dummy1
- In router, create two different groups and name it as First and Last
- Give the First group condition as Seqno <=2, Last group condition as Count - o_seqno <=2 as a result last 3 rows will be inserted into the target
- Finally connect default port to target 3, remaining records will be loaded into the target

Scheduling Tools In Informatica

Scheduling tools in informatica are used to schedule the jobs. It means these scheduling tools are used to automate workflows, the time specified by us.

Informatica scheduler is basic scheduling tool which is present as a default feature for scheduling.

- Go to edit workflow & select scheduler option. Select Non-Reusable or Re-usable based on your requirement.

- Click scheduler option available on the right for automation

- Put all the necessary details for scheduling purpose & save it.

- If we try to execute manually it fails because its scheduled on a particular time to execute.

But there are many drawbacks for this, so many third party schedulers are being introduce to fulfill the requirements. Some the schedulers are Autosys, Tivoli and Control M Schedulers.

Lets learn about the Tivoli Scheduling tool.

Thursday, May 23, 2019

Session parameters & Session variables

Session Parameter :-
----------------------
Session parameters, like mapping parameters, represent values you might want to change between sessions, such as a database connection or source file.

Use session parameters in the session properties, and then define the parameters in a parameter file.

You can specify the parameter file for the session to use in the session properties.

You can also specify it when you use pmcmd to start the session.The Workflow Manager provides one built-in session parameter, $PMSessionLogFile.With $PMSessionLogFile, you can change the name of the session log generated for the session.

The Workflow Manager also allows you to create user-defined session parameters.

Naming Conventions for User-Defined Session Parameters :

Parameter Type Parameter Connection
------------------ ------------------------
Database Connection $DBconnectionName
Source File $InputFileName
TargetFile $OutputFileName
Lookup File $LookupFileName
Reject File $BadFileName

Lets create a Mapping with session parameters for Oracle DB and execute it.

- Create any mapping and a workflow
- Double click the session, edit the connection 'source_system' and create a parameter connect as in the image.

- Create a parameter file as in the image
- First we give the folder name, next we give session name followed by the session parameter and its connection without spaces in between.

- Give the parameter file location along with name in the session properties.

- Now execute the workflow.

Lets create a Mapping with session parameters for Flat files and execute it.

- Create any mapping and a workflow
- Double click the session and give the source name as $InputFile.

- Create a parameter file as in the image
- First we give the folder name, next we give session name followed by the session parameter and Flat File location.

- Now execute the workflow.

Scenario 2 :-
-------------
Use session parameters to make sessions more flexible.
For example, you have the same type of transactional data written to two different databases, and you use the database connections TransDB1 and TransDB2 to connect to the databases.

You want to use the same mapping for both tables. Instead of creating two sessions for the same mapping, you can create a database connection parameter, $DBConnectionSource, and use it as the source database connection for the session.

When you create a parameter file for the session, you set $DBConnectionSource to TransDB1 and run the session. After the session completes, you set $DBConnectionSource to TransDB2 and run the session again.

You might use several session parameters together to make session management easier. For example, you might use source file and database connection parameters to configure a session to read data from different source files and write the results to different target databases. You can then use reject file parameters to write the session reject files to the target machine. You can use the session log parameter, $PMSessionLogFile, to write to different session logs in the target machine, as well.

When you use session parameters, you must define the parameters in the parameter file. Session parameters do not have default values. When the PowerCenter Server cannot find a value for a session parameter, it fails to initialize the session.

Session Variables :-
---------------------