Basic Informatica Tutorial

Wednesday, May 22, 2019

Roles and Responsibilities In My Project

Banking Domain Client :-
---------------------------
My client is ''. We work on customer data and employee data.

Employee Data :
If any employees resigns or joins or salary increments or transfers to other department , then the Employee data changes.
So we update the Employee dimension target tables with SCD II techniques.
We load the fact tables after loading dimension tables.

Facts & Dimension Involved are :
Date_Dim, State_Dim, Country_Dim, AccountType_Dim, Company_Dim, Currency_Dim, Bank_Dim
Balance_Fact, AccountID_Fact, CompanyID_Fact, BankID_Fact, CountryID_Fact

Roles and responsibilities in your project :-
---------------------------------------
1) Data we get will be in the format of flat files, sql or oracle.

- We perform change data capture mappings with mapping variable, whenever there is timestamp transactions.
- Sometimes we follow Incremental Aggregation logic, when we have incremental data.
- We follow Indirect meathods, when we have multiple flat files.
- We follow 'configure concurrent workflow' logic, when new flat files arrives now and then.
- We also do partitions ?

2) I work on mapping code errors and reply to them as problem is resolved.

- Connection problems like source database change, we correct it in parameter files.
- Workflow running for long time, we follow performance tuning techniques.
- Suddenly Code has stopped working, we check the complete mapping code and correct it.
- We use debugger to debug a mapping code.

3) We update dimensions when ever there is requirement, mostly the requirement will be on the end of every month.

- Using cust_key suggorate key into the target
- Using Md5 function
- Using dynamic lookup

4) I perform Error Handling techniques to ensure no data is missing.
-

5) I perform Performance tuning if the session is taking long time.

6) We execute workflow on unix with wrapper script
(pmcmd startworkflow -sv <Integration Service Name> -d <Domain Name> -u <Integration Service Username> -p <Password> -f <Folder Name> <Workflow>)

7) We schedule jobs and monitor jobs in tivoli workload scheduler.

Debugger to Troubleshoot Etl Logic

Debugger is an 'Interface' integrated with Mapping Designer. Informatica executes the mapping based on the transformation logic. Many times we get the data into the target which is not expected but with help of debugger, we can analyse data row by row in a debugger.

Before starting a debugger we need to select the session instance, whether we want to use new session or existing session.

Whenever we invoke the debugger, few windows will be displayed to show debugging information, such as Instance window, Target window, Mapping window and Debugger log. We can monitor the debugger with these windows

Target Window :- This shows how the data is processed into target window. We can notice whether the record is going to be inserted, updated, deleted or rejected. If there are any multiple target instances, select the instance name from the drop down window.

Instance Window :- This shows how the data is transformed at a transformation instance, this window gets refreshed as debugger progresses from one transformation to other. We can choose the transformations from the drop down menu and see how the data looks like at that particular transformation instance for a particular source row.

Break point in Debugger :-
----------------------------
When ever we look for a particular data, instead of checking every instance of source data, we can define break points. Break point is a condition defined during configuration of debugging such as when the condition becomes true debugger halts to that particular instance.

Use continue option in debugger to stop at when the statement is true. Here by using 'Continue' option, we don't need to manually press F10 key to go to the next row.

Error Handling Techinques

There are two types of errors; they are fatal errors and non-fatal errors.

A fatal error forces the session to fail.
This happens when it's unable to access source, target or repository

A non-fatal error does not force a session to fail.
It happens while reading data from source to target. Rejected records will come in this category.

To capture the transformation-level errors, we need to enable the property in the configure tab in session properties 'Error Log Type.'

And we choose DB connection along with the connection information.

As a result 4 error tables are created in database as below :-

PMERR_DATA : This stores Data and Meta data about a transformation row error and its corresponding source rows.
PMERR_MSG : This stores Meta data about an error and error msg
PMERR_SESS : This stores Meta data about session
PMERR_TRANS : This stores Meta data about source and transfomation ports such as name, datatype and when transformation errors occurs. By this we can verify the error, due to which mapping is failing.

Understanding Error Log File :-
---------------------------------
Error Type :
1 - Reader Error
2 - Writer Error
3 - Transformation Error

Source Row Type :
0 - Insert
1 - Update
2 - Delete
3 - Reject

Source Data :
D - Valid
O - Overflow
N - Null
T - Truncated
B - Binary
U - Data unavailable

Transformation Data :
D - Valid
O - Overflow
N - Null
T - Truncated
B - Binary
U - Data unavailable

Apart from this there are user defined exceptions, we will learn soon about it.

Stored Procedure Transformation

Stored Procedure Transformation is Passive transformation.
It is both connected and unconnected.
Procedure are stored in database and we call the stored procedure through stored procedure transformation in Informatica.
To perform the calculations like Salary hike, Min and Max salary, we use stored procedure Transfomation.

Examples :-
------------
If an application updates the customer table in ten different places, there can be a single stored procedure and a standard procedure call from the application for this functionality.
If a stored procedure has four SQL statements in the code, then there only needs to be a single call to the database instead of four calls for each individual SQL statement.

Two types of ports :-
----------------------
Input : This port is used to get Input parameter.
Output: This port will display the output values as per the procedure defined in database.
Return : This port is used to return values.

Stored Procedure Type :-
--------------------------
There are 5 different Options available in Stored Procedure Transformation Properties.

- Normal : It will run row by row basis.
- Source Pre Load : It works like property in Source Qualifier as 'Pre Sql'.
Before reading the data from the source Source Pre Load would act on database.
- Source Post Load : It works like property in Source Qualifier as 'Post Sql'.
After the load, Source Post Load would act on database.
- Target Pre Load : It works like property in Source Qualifier as 'Pre Sql'.
Before reading the data to the target Target Pre Load would act on database.
- Target Post Load : It works like property in Source Qualifier as 'Post Sql'.
After the load, Source Post Load would act on database

To create a stored procedure transformation :-

- Go to the transformations tab and click import stored procedure option.
- Connect to the database where the stored procedures is stored and select it.
- Now connect the appropriate ports to input and out put ports.

- Select the connection information

- Now execute the workflow, we get the expected results.

Friday, May 10, 2019

Loops in Stored Procedures

Whenever we execute a procedure each query or statement is executing once.
Loop is a programming concept which is used to execute a group of statements repeatedly for the specific no of times.

3 Types of Loops :-
- Simple Loop
- For Loop
- While Loop

Thursday, May 9, 2019

Aggregate functions

Aggregate functions :-
The functions SUM, COUNT, AVG, MIN, MAX are the common aggregate functions the result of which does not depend on the order of the records.

Aggregate functions Return a single output by calculating values from the table or column. They return the group value multiple times with each record.

SELECT DEPTNO, COUNT(*) DEPT_COUNT FROM EMP
WHERE DEPTNO IN (20, 30)
GROUP BY DEPTNO;

DEPTNO DEPT_COUNT
---------------------- ----------------------
20 5
30 6

SUM() function :-
It returns the sum of a column. The column should be numeric.

SELECT SUM(SAL) FROM EMP;
SELECT EMPNO, SUM(SAL) FROM EMP GROUP BY EMPNO;

COUNT() function :-
It returns/counts the number of rows in a query. But, it will not count any null values/column in a table.

SELECT COUNT(*) FROM EMP;
OR
SELECT COUNT(EMPNO) FROM EMP;
SELECT EMPNO, COUNT(EMPNO) FROM EMP GROUP BY EMPNO;

AVG() function :-
It returns the average value of a column. The column should be numeric.

SELECT AVG(SAL) FROM EMP;
SELECT EMPNO, MIN(SAL) FROM EMP GROUP BY EMPNO;

MIN() function :-
It returns the minimum or smallest value of a column. The column should be numeric.

SELECT MIN(SAL) FROM EMP;
SELECT EMPNO, AVG(SAL) FROM EMP GROUP BY EMPNO;

MAX() function :-
It returns the maximum or biggest value of a column. The column should be numeric.

SELECT MAX(SAL) FROM EMP;
SELECT EMPNO, MAX(SAL) FROM EMP GROUP BY EMPNO;

NOTE:

Select sum, sum(sal) from emp group by empno => This query will not work as empno is not in group by clause.

Select empno, sum(sal) from emp group by empno => This query will work as empno is present in group by clause.

Select sum(sal) from emp group by empno => This query will work as we are not selecting any column in select statement and group by clause has not issues.

Tuesday, May 7, 2019

Analytical Functions

The functions SUM, COUNT, AVG, MIN, MAX are the common aggregate functions the result of which does not depend on the order of the records.
Analytical functions like LEAD, LAG, RANK, DENSE_RANK, ROW_NUMBER, FIRST, FIRST VALUE, LAST, LAST VALUE depends on order of records.

Analytical functions :-
----------------------
Analytic functions Return the same number of rows as the input. Analytic functions give aggregate result they do not group the result set.
In order to arrange employees in a particular order like ascending or descending order with the help of analytical functional like rank, dense_rank

SELECT empno, deptno,
COUNT(*) OVER (PARTITION BY
deptno) DEPT_COUNT
FROM emp
WHERE deptno IN (20, 30);

EMPNO DEPTNO DEPT_COUNT
---------- ---------- ----------
7369 20 5
7566 20 5
7788 20 5
7902 20 5
7876 20 5
7499 30 6
7900 30 6
7844 30 6
7698 30 6
7654 30 6
7521 30 6

Types of Analytical Functions
- Rank
- Dense_rank
- Row_number
- Count
- Lag
- Lead
- First_value
- Last_value
- First
- Last

RANK function :-
It is used to rank a record within a group of rows
RANK() will return {1, 1, 3, 4, 4, 6} (note that the values 2 and 5 are skipped)

SELECT EMPNO, ENAME, JOB, RANK() OVER (ORDER BY JOB) AS RANK FROM EMP;

(OR)

SELECT E.*, RANK() OVER (ORDER BY JOB DESC) AS RANK FROM EMP E;

DENSE_RANK function :-
Dense_rank function acts like the RANK function except that it assigns consecutive ranks.
DENSE_RANK() will return {1,1,2,3,3,4}

SELECT EMPNO, ENAME, JOB, DENSE_RANK() OVER (ORDER BY JOB) AS RANK FROM EMP;

(OR)

SELECT E.*, DENSE_RANK() OVER (ORDER BY JOB DESC) AS DRANK FROM EMP E;

+---+------------+------+------------+
| V | ROW_NUMBER | RANK | DENSE_RANK |
+---+------------+------+------------+
| a | 1 | 1 | 1 |
| a | 2 | 1 | 1 |
| a | 3 | 1 | 1 |
| b | 4 | 4 | 2 |
| c | 5 | 5 | 3 |
| c | 6 | 5 | 3 |
| d | 7 | 7 | 4 |
| e | 8 | 8 | 5 |
+---+------------+------+------------+

ROWNUM function :-
Rownum is temporary, It is the sequential number.

SELECT ROW_NUMBER() OVER (PARTITION BY JOB ORDER BY JOB) AS ROWNUMBER,
EMPNO, ENAME, JOB FROM EMP;

(OR)

SELECT E.*, ROW_NUMBER() OVER (ORDER BY JOB DESC) AS RN FROM EMP E;

(OR)

Select * from

(Select *, row_number() over (partition by empno order by empno) as rn from emp) as E

where rn>1;

(Note: It is used to find the duplicates records in a table.)

ROWID function :-
Rowid is permanent, It is physical address of the rows or permanent unique identifier.

SELECT EMPNO, ENAME, JOB, ROWID FROM EMP;

LEAD function :-
It is used to display Subsequent row data along with current row data
For example, by using the LEAD() function, from the current row, you can access data of the next row, or the second row that follows the current row, or the third row that follows the current row, and so on.

Syntax :
LEAD ( scalar_expression [ ,offset ] , [ default ] )
OVER ( [ partition_by_clause ] order_by_clause )

SELECT EMPNO, ENAME, JOB, SAL, LEAD(SAL) OVER(PARTITION BY JOB ORDER BY SAL) AS HIGHEST_SAL FROM EMP;

(OR)

SELECT E.*, LEAD(SAL) OVER (ORDER BY SAL DESC) AS LEAD
FROM EMP E;

LAG function :-
It is used to display previous row data along with current row data. Here 1 indicated beside lag column skips 1 row and by default its 1 there is no use of writing it.
2 is indicated to skip two rows in the result.

Syntax :
LAG ( scalar_expression [ ,offset ] , [ default ] )
OVER ( [ partition_by_clause ] order_by_clause )

SELECT EMPNO, ENAME, SAL, LAG(SAL) OVER (ORDER BY SAL DESC) AS PREV_ROW FROM EMP;
(OR)
SELECT EMPNO, ENAME, SAL, LAG(SAL,1) OVER (ORDER BY SAL DESC) AS PREV_ROW FROM EMP;

SELECT EMPNO, ENAME, SAL, LAG(SAL,1,0) OVER (ORDER BY SAL DESC) AS PREV_ROW FROM EMP;

(OR)

SELECT E.*, LAG(SAL) OVER (ORDER BY SAL DESC) AS LAG
FROM EMP E;

FIRST_VALUE function :-
Its used to display first value from the specified column based on the order by clause.

SELECT EMPNO, ENAME, SAL, FIRST_VALUE(SAL) OVER (ORDER BY SAL) AS LOWEST_SAL FROM EMP;

SELECT EMPNO, ENAME, SAL, FIRST_VALUE(SAL) OVER (PARTITION BY JOB ORDER BY SAL) AS LOWEST_SAL FROM EMP;

(OR)

This query returns first salary from each department

select *, first_value(sal) over (partition by deptno order by deptno) as first_value from emp

LAST_VALUE function :-
Its used to display last value from the specified column based on the order by clause.

SELECT EMPNO, ENAME, SAL, LAST_VALUE(SAL) OVER (ORDER BY SAL) AS LOWEST_SAL FROM EMP;
As a result of this syntax it does not return expected results but it returns same rows.

SELECT EMPNO, ENAME, SAL, LAST_VALUE(SAL) OVER (ORDER BY SAL ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS HIGHEST_SAL FROM EMP;
This syntax gives you correct results.

(OR)

This query returns last salary from each department

select *, last_value(sal) over (partition by deptno order by deptno) as last_value from emp

NTH_VALUE:-

This query returns last salary from each department (NTH_VALUE function is available for 2022 version)

select *, NTH_VALUE(sal,2) over (partition by deptno order by deptno) as nv from emp

PARTITION BY clause :-
This clause is used only with "Over" clause. It is used to breakup or divide table data into many partitions, we can apply aggregate functions and analytical functions.

SELECT DEPTNO, ENAME, SAL, AVG(SAL) OVER (PARTITION BY DEPTNO) AS AVG_SAL FROM EMP;