When ETL Tool Informatica takes time to execute the workflow then we need to perform performance tuning in order to execute faster. There many steps involved in performing performance tuning.
At first when a session is triggered the integration service starts Data Transformation Manger (DTM) which is responsible to start reader, writer and transformation thread.
'Reader thread' is responsible for reading data from sources.
If there is any problem while reading data then its called 'Source bottleneck'
'Writer thread' is responsible for writing data to the targets.
If there is any problem while writing data then its called 'Target bottleneck'
'Transformation thread' is responsible to process data according to the logic.
If there is any problem while processing data then its called 'Mapping bottleneck'
When ever we run a session, it will capture all the thread statistics or details in the session log. In order to do performance tuning we would check the busy percentage in session log.
If the busy percentage is nearly 100, either in reader, writer or transformation threads, then we would identify it as performance bottleneck.
At first when a session is triggered the integration service starts Data Transformation Manger (DTM) which is responsible to start reader, writer and transformation thread.
'Reader thread' is responsible for reading data from sources.
If there is any problem while reading data then its called 'Source bottleneck'
'Writer thread' is responsible for writing data to the targets.
If there is any problem while writing data then its called 'Target bottleneck'
'Transformation thread' is responsible to process data according to the logic.
If there is any problem while processing data then its called 'Mapping bottleneck'
When ever we run a session, it will capture all the thread statistics or details in the session log. In order to do performance tuning we would check the busy percentage in session log.
If the busy percentage is nearly 100, either in reader, writer or transformation threads, then we would identify it as performance bottleneck.
Lets look at how to do performance tuning for the bottlenecks.
Source Bottleneck :-
---------------------
Incorrect source query or huge size queries or connection problems or small database network packet sizes can cause source bottleneck.
Performance tuning :-
- Rectify the source query
- Rectify the connections
Target Bottleneck :-
---------------------------
If problem is in Target then it can be due to heavy load into the target or connection problems
Performance tuning :-
- Use Bulk Load option to insert data.
- Increasing the commit interval from 10,000 to 50,000.
- Drop index before data loading and create indexes after data loading with the help of Pre SQL and Post SQL which is under truncate target table option.
- Increase Database network packet size from default 4096.
Mapping Bottleneck :-
-----------------------
If problem is in Transformation thread then it is called mapping bottleneck and we would tune Aggregator, Joiner, Lookup.
Aggregator :
- Filter the data before applying aggregation as it will reduce unnecessary aggregation operation.
- Limit no of ports used in aggregator this will reduce volume of data that stored inside aggregator cache.
- Use sorted input
- Group by on simple columns, use numbers for instead of string and date columns for Group By port.
Joiner :
- Always select the table with the less no of rows as master table.
- Sort the data before joining the tables.
Lookup :
- Create an index for the column which is used in lookup condition.
- Delete unnecessary columns from the lookup table.
Filter :-
- Use filter transformation as early as possible as it will remove unwanted data early in the mapping.
Source Qualifier :
- Bring only the required column from the source.
- Avoid using order by clause in source qualifier SQL override
Session Bottleneck :-
------------------------------
If there is no bottleneck in the source, target or mapping then we would check the session bottleneck
---------------------
Incorrect source query or huge size queries or connection problems or small database network packet sizes can cause source bottleneck.
Performance tuning :-
- Rectify the source query
- Rectify the connections
Target Bottleneck :-
---------------------------
If problem is in Target then it can be due to heavy load into the target or connection problems
Performance tuning :-
- Use Bulk Load option to insert data.
- Increasing the commit interval from 10,000 to 50,000.
- Drop index before data loading and create indexes after data loading with the help of Pre SQL and Post SQL which is under truncate target table option.
- Increase Database network packet size from default 4096.
Mapping Bottleneck :-
-----------------------
If problem is in Transformation thread then it is called mapping bottleneck and we would tune Aggregator, Joiner, Lookup.
Aggregator :
- Filter the data before applying aggregation as it will reduce unnecessary aggregation operation.
- Limit no of ports used in aggregator this will reduce volume of data that stored inside aggregator cache.
- Use sorted input
- Group by on simple columns, use numbers for instead of string and date columns for Group By port.
Joiner :
- Always select the table with the less no of rows as master table.
- Sort the data before joining the tables.
Lookup :
- Create an index for the column which is used in lookup condition.
- Delete unnecessary columns from the lookup table.
Filter :-
- Use filter transformation as early as possible as it will remove unwanted data early in the mapping.
Source Qualifier :
- Bring only the required column from the source.
- Avoid using order by clause in source qualifier SQL override
Session Bottleneck :-
------------------------------
If there is no bottleneck in the source, target or mapping then we would check the session bottleneck
No comments:
Post a Comment