Basic Informatica Tutorial

Wednesday, May 22, 2019

Mapping Parameters & Mapping variables

Mapping Parameters :-
------------------------
Parameter are used to minimize hard coding in ETL as it avoids changing values in ETL. Mapping parameters are values which remain constant through out the entire session. We will give the mapping parameters values in parameter file.

After we create parameter or variables it appears in expression editor, we can use it in any expression of the mapping. Before we run a session, we define parameter value in a parameter file for the session, here we use any constant value. If the parameter is not defined in parameter file, the integration service uses user defined value as parameter. If the initial value is not defined, the Integration Service uses a default value based on the datatype of the mapping parameter.

Parameter file contains two sections:-
------------------------------------
Header section / Global variables:
It is to identify integration service, folder, workflow, worklet, session to which parameter and variables apply

Definition section / Local variables :
It contains list of parameters and variables with their assigned values.

So lets create a simple mapping that loads source data to target table and pass the value to the parameter file.

Now we give condition in source qualifier as Source EMP1.DEPTNO=$$DEPTNO.

Now create a parameter file for the mapping

Now we have to give parameter file location.
Go to the mappings tab and in the left, we find an option as ' files, directories and commands', so click on that.A new screen appears, under 'parameter filename option' give the path of this parameter file including the file name.

So lets us execute workflow and see the output

Mapping variables :-
---------------------
Mapping variables are basically used for incremental load or to read the source. If your source table contains timestamp transactions and if you want to evaluation transaction on a daily basis we use mapping variables.

Mapping variables are values that can change between sessions.
Integration Service saves the latest value of a mapping variable to the repository at the end of each successful session.

The Integration Service holds two different values for a mapping variable during a session run:
- Start value of a mapping variable
- Current value of a mapping variable

Start Value :-
------------
The start value is the value of the variable at the start of the session. The start value could be a value defined in the parameter file for the variable, a value
assigned in the pre-session variable assignment, a value saved in the repository from the previous run of the session, a user defined initial value for the
variable, or the default value based on the variable datatype. The Integration Service looks for the start value in the following order:

- Value in parameter file
- Value in pre-session variable assignment
- Value saved in the repository
- Initial value
- Datatype default value

For example, you create a mapping variable in a mapping or mapplet and enter an initial value, but you do not define a value for the variable in a parameter file. The first time the Integration Service runs the session, it evaluates the start value of the variable to the configured initial value. The next time the session runs, the Integration Service evaluates the start value of the variable to the value saved in the repository. If you want to override the value saved in the repository before running a session, you need to define a value for the variable in a parameter file. When you define a mapping variable in the parameter file, the Integration Service uses this value instead of the value saved in the repository or the configured initial value for the variable.

When you use a mapping variable ('$$MAPVAR') in an expression, the expression always returns the start value of the mapping variable. If the start value of MAPVAR is 0, then $$MAPVAR returns 0. To return the current value of the mapping variable, use the following expression: SETVARIABLE($$MAPVAR, NULL).

Current Value :-
--------------
The current value is the value of the variable as the session progresses. When a session starts, the current value of a variable is the same as the start value. As the session progresses, the Integration Service calculates the current value using a variable function that you set for the variable. Unlike the start value of a mapping variable, the current value can change as the Integration Service evaluates the current value of a variable as each row passes through the mapping. The final current value for a variable is saved to the repository at the end of a successful session. When a session fails to complete, the Integration Service does not update the value of the variable in the repository. The Integration Service states the value saved to the repository for each mapping variable in the session log.

If a variable function is not used to calculate the current value of a mapping variable, the start value of the variable is saved to the repository.

The transformation language provides the following variable functions to use in a mapping:
- SetMaxVariable
- SetMinVariable
- SetCountVariable
- SetVariable.

The order in which the Integration Service encounters variable functions in the mapping may not be the same for every session run. This may cause inconsistent results when you use the same variable function multiple times in a mapping.

The Integration Service does not save the final current value of a mapping variable to the repository when any of the following conditions are true:

The session fails to complete.
The session is configured for a test load.
The session is a debug session.
The session runs in debug mode and is configured to discard session output.

In order to create Mapping variable, go to mappings tab and select ' parameters and variables' option. Add mapping variable here

Now we give condition in source qualifier as Source EMP1.DEPTNO> $$DATE

We define mapping variable in expression editor of expression transformation like this image

After executing the workflow, it will load variables in $$date.
Each and every time we execute workflow new variable is replace in the place of $$date. As a result we get a persistent value as in the image

So with the help of mapping variables we can perform incremental loads.
We can refresh persistent values if we want to load data from starting

Roles and Responsibilities In My Project

Banking Domain Client :-
---------------------------
My client is ''. We work on customer data and employee data.

Employee Data :
If any employees resigns or joins or salary increments or transfers to other department , then the Employee data changes.
So we update the Employee dimension target tables with SCD II techniques.
We load the fact tables after loading dimension tables.

Facts & Dimension Involved are :
Date_Dim, State_Dim, Country_Dim, AccountType_Dim, Company_Dim, Currency_Dim, Bank_Dim
Balance_Fact, AccountID_Fact, CompanyID_Fact, BankID_Fact, CountryID_Fact

Roles and responsibilities in your project :-
---------------------------------------
1) Data we get will be in the format of flat files, sql or oracle.

- We perform change data capture mappings with mapping variable, whenever there is timestamp transactions.
- Sometimes we follow Incremental Aggregation logic, when we have incremental data.
- We follow Indirect meathods, when we have multiple flat files.
- We follow 'configure concurrent workflow' logic, when new flat files arrives now and then.
- We also do partitions ?

2) I work on mapping code errors and reply to them as problem is resolved.

- Connection problems like source database change, we correct it in parameter files.
- Workflow running for long time, we follow performance tuning techniques.
- Suddenly Code has stopped working, we check the complete mapping code and correct it.
- We use debugger to debug a mapping code.

3) We update dimensions when ever there is requirement, mostly the requirement will be on the end of every month.

- Using cust_key suggorate key into the target
- Using Md5 function
- Using dynamic lookup

4) I perform Error Handling techniques to ensure no data is missing.
-

5) I perform Performance tuning if the session is taking long time.

6) We execute workflow on unix with wrapper script
(pmcmd startworkflow -sv <Integration Service Name> -d <Domain Name> -u <Integration Service Username> -p <Password> -f <Folder Name> <Workflow>)

7) We schedule jobs and monitor jobs in tivoli workload scheduler.

Debugger to Troubleshoot Etl Logic

Debugger is an 'Interface' integrated with Mapping Designer. Informatica executes the mapping based on the transformation logic. Many times we get the data into the target which is not expected but with help of debugger, we can analyse data row by row in a debugger.

Before starting a debugger we need to select the session instance, whether we want to use new session or existing session.

Whenever we invoke the debugger, few windows will be displayed to show debugging information, such as Instance window, Target window, Mapping window and Debugger log. We can monitor the debugger with these windows

Target Window :- This shows how the data is processed into target window. We can notice whether the record is going to be inserted, updated, deleted or rejected. If there are any multiple target instances, select the instance name from the drop down window.

Instance Window :- This shows how the data is transformed at a transformation instance, this window gets refreshed as debugger progresses from one transformation to other. We can choose the transformations from the drop down menu and see how the data looks like at that particular transformation instance for a particular source row.

Break point in Debugger :-
----------------------------
When ever we look for a particular data, instead of checking every instance of source data, we can define break points. Break point is a condition defined during configuration of debugging such as when the condition becomes true debugger halts to that particular instance.

Use continue option in debugger to stop at when the statement is true. Here by using 'Continue' option, we don't need to manually press F10 key to go to the next row.

Error Handling Techinques

There are two types of errors; they are fatal errors and non-fatal errors.

A fatal error forces the session to fail.
This happens when it's unable to access source, target or repository

A non-fatal error does not force a session to fail.
It happens while reading data from source to target. Rejected records will come in this category.

To capture the transformation-level errors, we need to enable the property in the configure tab in session properties 'Error Log Type.'

And we choose DB connection along with the connection information.

As a result 4 error tables are created in database as below :-

PMERR_DATA : This stores Data and Meta data about a transformation row error and its corresponding source rows.
PMERR_MSG : This stores Meta data about an error and error msg
PMERR_SESS : This stores Meta data about session
PMERR_TRANS : This stores Meta data about source and transfomation ports such as name, datatype and when transformation errors occurs. By this we can verify the error, due to which mapping is failing.

Understanding Error Log File :-
---------------------------------
Error Type :
1 - Reader Error
2 - Writer Error
3 - Transformation Error

Source Row Type :
0 - Insert
1 - Update
2 - Delete
3 - Reject

Source Data :
D - Valid
O - Overflow
N - Null
T - Truncated
B - Binary
U - Data unavailable

Transformation Data :
D - Valid
O - Overflow
N - Null
T - Truncated
B - Binary
U - Data unavailable

Apart from this there are user defined exceptions, we will learn soon about it.

Stored Procedure Transformation

Stored Procedure Transformation is Passive transformation.
It is both connected and unconnected.
Procedure are stored in database and we call the stored procedure through stored procedure transformation in Informatica.
To perform the calculations like Salary hike, Min and Max salary, we use stored procedure Transfomation.

Examples :-
------------
If an application updates the customer table in ten different places, there can be a single stored procedure and a standard procedure call from the application for this functionality.
If a stored procedure has four SQL statements in the code, then there only needs to be a single call to the database instead of four calls for each individual SQL statement.

Two types of ports :-
----------------------
Input : This port is used to get Input parameter.
Output: This port will display the output values as per the procedure defined in database.
Return : This port is used to return values.

Stored Procedure Type :-
--------------------------
There are 5 different Options available in Stored Procedure Transformation Properties.

- Normal : It will run row by row basis.
- Source Pre Load : It works like property in Source Qualifier as 'Pre Sql'.
Before reading the data from the source Source Pre Load would act on database.
- Source Post Load : It works like property in Source Qualifier as 'Post Sql'.
After the load, Source Post Load would act on database.
- Target Pre Load : It works like property in Source Qualifier as 'Pre Sql'.
Before reading the data to the target Target Pre Load would act on database.
- Target Post Load : It works like property in Source Qualifier as 'Post Sql'.
After the load, Source Post Load would act on database

To create a stored procedure transformation :-

- Go to the transformations tab and click import stored procedure option.
- Connect to the database where the stored procedures is stored and select it.
- Now connect the appropriate ports to input and out put ports.

- Select the connection information

- Now execute the workflow, we get the expected results.

Friday, May 10, 2019

Loops in Stored Procedures

Whenever we execute a procedure each query or statement is executing once.
Loop is a programming concept which is used to execute a group of statements repeatedly for the specific no of times.

3 Types of Loops :-
- Simple Loop
- For Loop
- While Loop

Thursday, May 9, 2019

Aggregate functions

Aggregate functions :-
The functions SUM, COUNT, AVG, MIN, MAX are the common aggregate functions the result of which does not depend on the order of the records.

Aggregate functions Return a single output by calculating values from the table or column. They return the group value multiple times with each record.

SELECT DEPTNO, COUNT(*) DEPT_COUNT FROM EMP
WHERE DEPTNO IN (20, 30)
GROUP BY DEPTNO;

DEPTNO DEPT_COUNT
---------------------- ----------------------
20 5
30 6

SUM() function :-
It returns the sum of a column. The column should be numeric.

SELECT SUM(SAL) FROM EMP;
SELECT EMPNO, SUM(SAL) FROM EMP GROUP BY EMPNO;

COUNT() function :-
It returns/counts the number of rows in a query. But, it will not count any null values/column in a table.

SELECT COUNT(*) FROM EMP;
OR
SELECT COUNT(EMPNO) FROM EMP;
SELECT EMPNO, COUNT(EMPNO) FROM EMP GROUP BY EMPNO;

AVG() function :-
It returns the average value of a column. The column should be numeric.

SELECT AVG(SAL) FROM EMP;
SELECT EMPNO, MIN(SAL) FROM EMP GROUP BY EMPNO;

MIN() function :-
It returns the minimum or smallest value of a column. The column should be numeric.

SELECT MIN(SAL) FROM EMP;
SELECT EMPNO, AVG(SAL) FROM EMP GROUP BY EMPNO;

MAX() function :-
It returns the maximum or biggest value of a column. The column should be numeric.

SELECT MAX(SAL) FROM EMP;
SELECT EMPNO, MAX(SAL) FROM EMP GROUP BY EMPNO;

NOTE:

Select sum, sum(sal) from emp group by empno => This query will not work as empno is not in group by clause.

Select empno, sum(sal) from emp group by empno => This query will work as empno is present in group by clause.

Select sum(sal) from emp group by empno => This query will work as we are not selecting any column in select statement and group by clause has not issues.