Showing posts with label Data Warehouse. Show all posts
Showing posts with label Data Warehouse. Show all posts

Tuesday, November 4, 2025

Data Modeling

- Data modeling is a detailed process that involves creating a visual representation of data and its relationships. It serves as a blueprint for how data is structured, stored, and accessed to ensure consistency and clarity in data management

- A well-designed model helps prevent common issues like redundancy, performance bottlenecks, and difficulty adapting to future changes.  


Types of Data Models:
---------------------------
- Conceptual Data Model: It defines key business entities and relationships without getting into technical details.

- Logic Data Model: It defines how the data is structured. It includes data's attributes, relationships, and constraints and also a bridge between business requirements and the physical implementation of the data. 

- Physical Data Model: It defines how the data is stored in a database like specific table structures, indexes, and storage mechanisms


Data Modeling Techniques:
---------------------------------
Data modeling is not a one-size-fits-all process. Different techniques are employed depending on the complexity of the data and the goals. There 4 different data modeling techniques

* Entity-relationship (ER) modeling
- Entities
- Relationships
- Attributes 

 * Dimensional modeling : Dimensional modeling is widely used in data warehousing and analytics, where data is often represented in terms of facts and dimensions.
- Fact table
- Dimension tables

* Object-oriented modeling: Object-oriented modeling is used to represent complex systems, where data and the functions that operate on it are encapsulated as objects.
- This approach is particularly beneficial in object-oriented programming (OOP) languages like Java and Python, where data models can be directly mapped to classes and objects.

* NoSQL and document-based modeling: NoSQL and document-based modeling techniques are designed for flexible, schema-less databases.
- In NoSQL databases like MongoDB, a document-based model organizes data into collections of documents, where each document can have a unique structure.
- This flexibility allows for faster iteration and scaling, particularly in big data environments or applications requiring high-speed data access.


Matching data models with data modeling techniques:
-----------------------------------------------------------------

* Conceptual data model → Entity-relationship (ER) modeling
* Logical data model → ER, dimensional, and object-oriented modeling
* Physical data model → Dimensional, object-oriented, and NoSQL modeling


Normalize early, denormalize when necessary
-------------------------------------------------------

- Normalization is a foundational concept in data modeling. It organizes data so that redundancy is minimized and consistency is ensured by structuring it into smaller, logically related tables.
- This process reduces the chances of anomalies and discrepancies, making the data easier to manage and update.


Future-proofing your data model :
-----------------------------------------
As businesses evolve, so do their data requirements. Designing a future-proof data model means creating one that’s flexible and scalable, ready to handle new data sources and changing demands. 

Imagine you’re designing a database for an online store. Initially, you track only domestic orders, but later, you expand globally. You'll need major modifications later if your schema only supports a single currency.

Instead of a simple Price column, include Currency_Code and Exchange_Rate fields.Use a flexible schema that allows new attributes without restructuring the database.


=================== Normalization in SQL (1NF - 5NF) ============================

Database normalization is an important process used to organize and structure relational databases. This process ensures that data is stored in a way that minimizes redundancy, simplifies querying, and improves data integrity.

In simpler terms, it involves breaking down a large, complex table into smaller and simpler tables while maintaining data relationships. Normalization is commonly used when dealing with large datasets. 

If a table is not properly normalized and has data redundancy, it will not only take up extra data storage space but also make it difficult to handle and update the database.

First Normal Form (1NF):












Sunday, June 23, 2019

Types Of Keys In Database

Primary Key :-
---------------
A primary key is a single field or combination of fields that uniquely identifies a row in the table.

The following are rules that make a column a primary key:
- A primary key column cannot contain a NULL value
- A primary key value must be unique within the entire table
- A primary key value should not be changed over time


Foreign Key :-
---------------
A foreign key means that values in one table must also appear in another table.
The referenced table is called the parent table while the table with the foreign key is called the child table.
The foreign key in the child table will generally reference a primary key in the parent table.


Natural Key and Surrogate Key:-
------------------------------------
Sometimes the primary key is made up of real data and these are normally referred to as natural keys, while other times the key is generated when a new record is inserted into a table.   When a primary key is generated at runtime, it is called a surrogate key.

A natural key is a single column or set of columns that uniquely identifies a single record in a table, where the key columns are made up of real data.  When I say “real data” I mean data that has meaning and occurs naturally in the world of data.  A natural key is a column value that has a relationship with the rest of the column values in a given data record.  

keys that don’t have a natural relationship with the rest of the columns in a table.  The surrogate key is just a value that is generated and then stored with the rest of the columns in a record. The key value is typically generated at run time right before the record is inserted into a table.

Friday, April 12, 2019

Datamarts

Datamart is the subset (a part) of the Data Warehouse. Datamart focuses on a particular subject or a small business unit where as Data warehouse is one unit which contains all the business units.
Examples of Datamarts are sales, finance, Hr and so on.



There are two approaches in creating for Designing Datamarts or Data Warehouse. 

Bottom-up Design : This approach is introduced by 'Ralph Kimball'. Here first Datamarts are created and after that data warehouse is created. Later on the Datamarts are integrated to built complete data Warehouse. 

Top-Down Design : This approach is introduced by 'Inmon'. Here first Data warehouse is create and after that Datamarts are created. Later on Datamart are created from the Data Warehouse.

Dimensions

Dimension stores the textual description of the business. It means it stores the details related to the business transactions like place, date, name which cannot be measured.
Different types of dimensions are below

Conformed Dimension : This dimension means exactly same thing with every possible fact to which they are joined. In the simple words, it means one common dimension is shared when one fact is joined with another fact.
Date dimension is the perfect example of it.




Junk Dimension : All the dimensions which does not come to any category are stored in a particular place and this structure is called Junk dimension.
Gender and Marital status and Flag are the perfect examples of it.




Role Playing Dimension : Dimensions which are used for multiple purposes within the same databases are called as "Role Playing Dimension".
Date of Sale and Date of Delivery is the perfect example of it.

Degenerated Dimension : This dimension is derived from the fact table and does not have it own dimension table.
Transaction code and Invoice no is the perfect example of it.



Facts

Facts are also known as "Measures". These facts are used to measure the business values and forecast business future. 

The different types of facts are

Additive facts : The facts that can be summed up through all the dimension in the fact table. Sales fact is the good example for additive fact. 


Semi additive facts : The facts that can be summed up through some of the dimensions in the fact table. Daily balance is the perfect example of semi additive fact.

Non additive facts : The facts that cannot be summed up through any of the dimensions in the fact table. Percentage and ratios are the perfect example of non additive fact.

Factless Fact : A table which does not contain measures or facts are called as factless fact. Product key and Date key is the perfect example of factless fact.




Dimension Modelling (Schema's)

Data Warehouse is designed based on Schema's. Whenever we load the data into the warehouse it should be in the form of schema's. In the simple words, schema is nothing but structure of the data.

This structure of data is built mainly on two components called as "Facts and Dimensions". Based on this the complete data is divided into two parts as "Facts and Dimensions"


This schema's are categorized into 3 different types based on their characteristics and they are 

Star Schema : It's called as "Star Schema" because its structure is similar to a star. It consists of both dimensions and facts tables in it. One fact table at the center and surrounded by denormalized (Duplicate) dimension tables. It can have more than one fact table.




Snow Flake Schema : It is an extension of the star schema. Here it adds additional dimension tables. So here there are many dimension tables which are normalized, as a result less duplicate data is stored than star schema. Snow Flake effects only dimension table but not Fact tables.




Galaxy Schema : It has many fact tables and surrounded with dimension tables. Here Fact tables contains many common dimensional tables.






Introduction To Data Warehousing

Data ware housing concepts are most important in understanding IT work environment. Actually what is data ware housing ? How does it works ? Lets start understanding it.

When a business starts it will store all the financial data for the future analysis purpose. After some time it becomes difficulty in handling huge amount of data or historical data.
So in order to handle this all the data is stored or moved to a particular place and that place is called as "data warehouse" and that process is called as "data warehousing".

When all the business data is stored with Organization for period of time and this system is called as "OLTP" or "Online Transaction Processing". Later on when it is moved by ETL process is called "OLAP" or "Online Analytical Processing".

OLTP is real time data and it is stored for short time (also know as Database)
OLAP is Historical data used for analytical purpose (also know as Data warehouse)

All this process of moving data is done with an ETL tools. And one of the best tools available in the market is "Informatica Power center".




Teradata Database

  How Teradata is different from other database? - Parallelism : Massive Parallel Processing(MPP) Architecture. It does parallel processi...