Basic Informatica Tutorial

Tuesday, July 30, 2019

Unix Filters

List of Filters in Unix:-
-----------------------
$HEAD
$TAIL
$NL
$CUT
$PASTE
$SORT
$TR
$TEE
$SED
$GREP
$FGREP
$EGREP

$HEAD :- (These are line filters)
--------
Head filename : It always displays first 10 lines from a file
Head -3 file1 : It displays first 3 lines.
Head -5 file1 : It displays first 5 lines.
Head -5v file1 : It displays first 5 lines along with the filename (here v indicates verbose).
Head -5 file1 file2 : It displays first 5 lines from both the files along with the filenames.

$TAIL :- (These are line filters)
-------
Tail filename : It always displays last 10 lines from a file
Tail -3 file1 : It displays last 3 lines.
Tail -5 file1 : It displays last 5 lines.
Tail -5v file1 : It displays last 5 lines along with the filename (here v indicates verbose).
Tail -5 file1 file2 :
Head -5 file1 | Tail -1 file1 : It displays first 5 lines and then displays last line from the result of 5.

NL :- (Number lines of files)
------
Nl file1 : It displays line numbers in file1
Nl file1 > file2 : It redirects all the file1 information to file2.

CUT :- (It removes sections from each line of files)
------
Cut -d ',' -f 3 file1 : It displays only the 3rd field from the file1 (complete 3rd column is display). For delimiter separated fields, the -d option is used.
Cut -d ' ' -f 3 file1 : It displays only the 3rd field from the file1 (complete 3rd column is display). The default delimiter is the tab character.
Cut -d ',' -f 2,3 file1 : It displays only the 2nd and 3rd field from the file1 (complete 2nd and 3rd column is display).
Cut -d ',' -f 1,3 file1 : It displays only the 1st and 3rd field from the file1 (complete 1st and 3rd column is display).
Cut -d ' ' -f 1-3 file1 : It displays from 1st to 3rd field from the file1 (complete 1st to 3rd column is display).
Cut -c1,7 file1 : It displays first character and 7th character.
Cut -c1-7 file1 : It displays from first character to 7th character.

PASTE :- (It merges lines of files)
--------
Paste file1 file2 : It merges two files or it display both files side by side.
Paste -s file1 file2 : It merges two files or it display both files one by one.

SORT :-
-------
Sort file1 : It sorts fields from the file1.
Sort -r file1 : It sorts fields from the file1 in the reverse order.
Sort -n file1 : It sorts fields from the file1 in the numerical order, if the numeric data exits.
Sort -rn file1 : It sorts fields from the file1 in the numerical reverse order, if the numeric data exits.
Sort -ofile2 file1 : It sorts and outputs the fields of file1 in file2.

TR :- (Translate or delete characters)
-----
Tr 's' 'n' < file1 : It converts character S to N, where s is present from the file1.
Tr 'a-z' 'A-Z' < file1 : It converts all the small characters into Capital letters.
Tr 'aeiou' 'AEIOU' < file1 : It converts all the small letters vowels into capital letters vowels.

TEE :- (It reads from standard input and writes to standard output)
-----
Cal | Tee file1 : It takes input from cal and redirects to file1.
Wc -l file1.txt| tee file2.txt

SED :-
------
(Sed command or Stream Editor is very powerful utility offered by Linux/Unix systems. It is mainly used for text substitution , find & replace but it can also perform other text manipulations like insertion, deletion, search etc. With SED, we can edit complete files without actually having to open it.)

Sed 2q file1 : It filters and displays only first two lines.
Sed -n 2p file1 : It filters and displays only second line from file1.
Sed -n -e 2p -e 5p file1 : It filters and displays only second line and 5th line from file1.
Sed 4d fruits : '4’ is the line number & option ‘d’ will delete the mentioned line number.
Sed 3d fruits > newfile : '3’ is the line number & option ‘d’ will delete the mentioned line number and result will be display in newfile.

GREP :- (Globally search for regular expressions)
-------
Grep Apple File1 : It displays lines with apple from the file1.
Grep column_name File1 : It displays column_name from the file1.
Grep -n column_name File1 : It displays number for column_name from the file1.

FGREP :- (Fast Globally search for regular expressions)
-------

EGREP :- (Extended Globally search for regular expressions)
-------

Sunday, June 30, 2019

Indirect File Method

Unlike Direct method, Indirect file method is used when we have multiple files and load them to single target.

To configure Indirect method :-
----------------------------------
- Create a flat file and paste all the path of the flat files in it.

- Drag the one of the source definitions among the multiple flat files and develop code.
- In the session level properties, select the source types as 'indirect option' instead of direct.
- Give the path in source file directory.
- Give Common file name in source filename field.

Saturday, June 29, 2019

SCD Type 2 Flag

SCD Type 2 Flag is the method used to store historical data is maintained along with current data.

- Create a mapping and name it.
- Drag Emp source into the mapping area.
- Create a Lookup transformation on target table.
- On Primary key columns you should define your condition, because its the matching column between source and target.

- Create an Expression, drag the ports from lookup transformation and name as prev record. This tells us that records are old ones which are used to compare with new ones.
- Drag the comparision keys from source qualifier to Expression
- Create two ports as Newflag and Changedflag and give syntax as Isnull (Cust_key) and Not Isnull(Cust_key) and (Prev_Empn != Empno OR Prev_Ename != Ename OR Prev_Sal != Sal)

- First syntax will check for nulls. If there are null, lookup will insert data into the target table.
- Second syntax will check for existing rows. If there are rows, lookup will update data into the target table.
- Create Filter1 and drag all the ports from source qualifier and give filter condition as NEWFLAG
- Connect update strategy and give the 'Update Strategy Expression' as DD_INSERT

- Create Filter2 and drag Cust key and comparision keys from source and give filter condition as CHANGEFLAG
- Connect update strategy and give the 'Update Strategy Expression' as DD_INSERT
- Create Sequence gen, connect it to Expression and connect the NEXTVAL port to CUST_KEY ports in both Target instances 1 & 2
- Also create FLAG port, assign value 1 to it and connect port to both Target instances 1 & 2
- Create another Expression, drag CUST_KEY to it from Filter2 and create FLAG port, Give the value as 0.
- Create a new update strategy and give the 'Update Strategy Expression' as DD_UPDATE
- Make three instances of target, Connect update strategy 1 to target 1 and update strategy 2 to target 2, Connect update strategy 3 to target 3
- Here First pipeline inserts new data into the target, Second pipeline inserts changed data into the target where as Third pipeline updates changed data into the target. Connect only that ports which you want to insert and update in the target table
- Final mapping looks like below screen shot

Note 1 : Lookup condition should be on logical key column
Note 2 : lookup transformation should contain Logical key, Primary key & Comparison key columns.
Note 3 : Second pipeline in the mapping contains comparison key, primary key & changed flag columns.

Note 4 : Expression transformation should contain only primary key, comparision keys along with related source keys and new and changed flag.

SCD Type 2 Effective Date

SCD Type 2 Effective Date is the method used to store historical data is maintained along with current data.

- Create a mapping and name it.
- Drag Emp source into the mapping area.
- Create a Lookup transformation on target table.
- On Primary key columns you should define your condition, because its the matching column between source and target.

- Create Filter2 and drag Cust key and comparision keys from source and give filter condition as CHANGEFLAG
- Connect update strategy and give the 'Update Strategy Expression' as DD_INSERT
- Create another Expression, drag CUST_KEY to it from Filter2 and create END_DATE port. Give the syntax as SYSDATE.
- Create a new update strategy and give the 'Update Strategy Expression' as DD_UPDATE
- Make three instances of target, Connect update strategy 1 to target 1 and update strategy 2 to target 2, Connect update strategy 3 to target 3
- Here First pipeline inserts new data into the target, Second pipeline inserts changed data into the target where as Third pipeline updates changed data into the target. Connect only that ports which you want to insert and update in the target table
- Create Seq Gen and connect to Expression with BEGIN_DATE and assign value as SYSDATE. Connect it to Target instance 1
- Final mapping looks like below screen shot

Note 1 : Lookup condition should be on logical key column
Note 2 : lookup transformation should contain Logical key, Primary key & Comparison key columns.
Note 3 : Second pipeline in the mapping contains comparison key, primary key & changed flag columns.
Note 4 : Expression transformation should contain only primary key, comparision keys along with related source keys and new and changed flag.

Monday, June 24, 2019

Normalizer Transformation

Normalizer Transformation is Active and Connected transformation. It converts single row data into multiple columns data. It converts de-normalized table into a normalized table. You cannot drag & drop columns to normalizer transformation like the rest of the transformations.

* Normalizer is used to convert rows into columns.
* Normalizer is used in the place of source qualifier while reading mainframe or Cobal Source.

Steps to create Normalizer transformation :-
-----------------------------------------------------------------
- Go to tranformations tab and select normalizer
- Double click normalizer and select the normalizer tab
- Add the occurs based on the requirement

In Normalizer properties tab 'Reset' and 'Restart' are the two options available.

Reset is used to reset the Gk value to the value, that is used before the session.
Restart is used to start Gk sequence from 1 and restart for each session

There 2 important ports available and they are GK and GCID

GK generates sequence number starting from the value defined in the sequence field.

GCID hold the value of the occurrence field.

Sunday, June 23, 2019

Types Of Keys In Database

Primary Key :-
---------------
A primary key is a single field or combination of fields that uniquely identifies a row in the table.

The following are rules that make a column a primary key:
- A primary key column cannot contain a NULL value
- A primary key value must be unique within the entire table
- A primary key value should not be changed over time

Foreign Key :-
---------------
A foreign key means that values in one table must also appear in another table.
The referenced table is called the parent table while the table with the foreign key is called the child table.
The foreign key in the child table will generally reference a primary key in the parent table.

Natural Key and Surrogate Key:-
------------------------------------
Sometimes the primary key is made up of real data and these are normally referred to as natural keys, while other times the key is generated when a new record is inserted into a table. When a primary key is generated at runtime, it is called a surrogate key.

A natural key is a single column or set of columns that uniquely identifies a single record in a table, where the key columns are made up of real data. When I say “real data” I mean data that has meaning and occurs naturally in the world of data. A natural key is a column value that has a relationship with the rest of the column values in a given data record.

keys that don’t have a natural relationship with the rest of the columns in a table. The surrogate key is just a value that is generated and then stored with the rest of the columns in a record. The key value is typically generated at run time right before the record is inserted into a table.

Sunday, May 26, 2019

Mapplet & Worklet

Mapplet :-

----------

If you want to create same logic in multiple mapping we use Mapplets. Instead of creating logic every time, we can create it in a single Mapplet and use it in many mappings.

- Go to Mapplet designer, create Mapplet input and output from transformations

- Apply the mapping logic between input and output Mapplet.

- Now apply this logic in mapping where ever there is requirement.

- If you want to show mapplet logic in the mapping just go to mapping click expand/unexpand option. Select correct one from the list of mapplets & click ok.

Worklet :-
----------
If you want to create dependencies between the workflows or sessions, we use worklets.
If you have many sessions in a mapping its very difficult to manage with dependencies, instead of adding dependencies among various session we use Worklets to make our work easy by creating dependencies among different Worklets.

- Create a worklet, name it and add connect sessions to it.

- Drag worklet from left and connect all of them together in the workflow designer.

Reusable Transformation :-

--------------------------------

If we want to use any transformation again & again in the mapping we can create reusable transformation.

Double click transformation & check option reusable to make it reusable transformation.

We cannot edit it once its converted into reusable. We need to go to Transformation developer to change anything if required. If any transformation is created in Transformation developer it will be by default a reusable transformation