Basic Informatica Tutorial: Data Warehouse

Showing posts with label Data Warehouse. Show all posts

Sunday, June 23, 2019

Types Of Keys In Database

Primary Key :-
---------------
A primary key is a single field or combination of fields that uniquely identifies a row in the table.

The following are rules that make a column a primary key:
- A primary key column cannot contain a NULL value
- A primary key value must be unique within the entire table
- A primary key value should not be changed over time

Foreign Key :-
---------------
A foreign key means that values in one table must also appear in another table.
The referenced table is called the parent table while the table with the foreign key is called the child table.
The foreign key in the child table will generally reference a primary key in the parent table.

Natural Key and Surrogate Key:-
------------------------------------
Sometimes the primary key is made up of real data and these are normally referred to as natural keys, while other times the key is generated when a new record is inserted into a table. When a primary key is generated at runtime, it is called a surrogate key.

A natural key is a single column or set of columns that uniquely identifies a single record in a table, where the key columns are made up of real data. When I say “real data” I mean data that has meaning and occurs naturally in the world of data. A natural key is a column value that has a relationship with the rest of the column values in a given data record.

keys that don’t have a natural relationship with the rest of the columns in a table. The surrogate key is just a value that is generated and then stored with the rest of the columns in a record. The key value is typically generated at run time right before the record is inserted into a table.

Friday, April 12, 2019

Datamarts

Datamart is the subset (a part) of the Data Warehouse. Datamart focuses on a particular subject or a small business unit where as Data warehouse is one unit which contains all the business units.
Examples of Datamarts are sales, finance, Hr and so on.

There are two approaches in creating for Designing Datamarts or Data Warehouse.

Bottom-up Design : This approach is introduced by 'Ralph Kimball'. Here first Datamarts are created and after that data warehouse is created. Later on the Datamarts are integrated to built complete data Warehouse.

Top-Down Design : This approach is introduced by 'Inmon'. Here first Data warehouse is create and after that Datamarts are created. Later on Datamart are created from the Data Warehouse.

Dimensions

Dimension stores the textual description of the business. It means it stores the details related to the business transactions like place, date, name which cannot be measured.
Different types of dimensions are below

Conformed Dimension : This dimension means exactly same thing with every possible fact to which they are joined. In the simple words, it means one common dimension is shared when one fact is joined with another fact.
Date dimension is the perfect example of it.

Junk Dimension : All the dimensions which does not come to any category are stored in a particular place and this structure is called Junk dimension.
Gender and Marital status and Flag are the perfect examples of it.

Role Playing Dimension : Dimensions which are used for multiple purposes within the same databases are called as "Role Playing Dimension".
Date of Sale and Date of Delivery is the perfect example of it.

Degenerated Dimension : This dimension is derived from the fact table and does not have it own dimension table.
Transaction code and Invoice no is the perfect example of it.

Facts

Facts are also known as "Measures". These facts are used to measure the business values and forecast business future.

The different types of facts are

Additive facts : The facts that can be summed up through all the dimension in the fact table. Sales fact is the good example for additive fact.

Semi additive facts : The facts that can be summed up through some of the dimensions in the fact table. Daily balance is the perfect example of semi additive fact.

Non additive facts : The facts that cannot be summed up through any of the dimensions in the fact table. Percentage and ratios are the perfect example of non additive fact.

Factless Fact : A table which does not contain measures or facts are called as factless fact. Product key and Date key is the perfect example of factless fact.

Dimension Modelling (Schema's)

Data Warehouse is designed based on Schema's. Whenever we load the data into the warehouse it should be in the form of schema's. In the simple words, schema is nothing but structure of the data.

This structure of data is built mainly on two components called as "Facts and Dimensions". Based on this the complete data is divided into two parts as "Facts and Dimensions"

This schema's are categorized into 3 different types based on their characteristics and they are

Star Schema : It's called as "Star Schema" because its structure is similar to a star. It consists of both dimensions and facts tables in it. One fact table at the center and surrounded by denormalized (Duplicate) dimension tables. It can have more than one fact table.

Snow Flake Schema : It is an extension of the star schema. Here it adds additional dimension tables. So here there are many dimension tables which are normalized, as a result less duplicate data is stored than star schema. Snow Flake effects only dimension table but not Fact tables.

Galaxy Schema : It has many fact tables and surrounded with dimension tables. Here Fact tables contains many common dimensional tables.

Introduction To Data Warehousing

Data ware housing concepts are most important in understanding IT work environment. Actually what is data ware housing ? How does it works ? Lets start understanding it.

When a business starts it will store all the financial data for the future analysis purpose. After some time it becomes difficulty in handling huge amount of data or historical data.
So in order to handle this all the data is stored or moved to a particular place and that place is called as "data warehouse" and that process is called as "data warehousing".

When all the business data is stored with Organization for period of time and this system is called as "OLTP" or "Online Transaction Processing". Later on when it is moved by ETL process is called "OLAP" or "Online Analytical Processing".

OLTP is real time data and it is stored for short time (also know as Database)
OLAP is Historical data used for analytical purpose (also know as Data warehouse)

All this process of moving data is done with an ETL tools. And one of the best tools available in the market is "Informatica Power center".

Basic Informatica Tutorial

Labels