Basic Informatica Tutorial: November 2025

Tuesday, November 11, 2025

Teradata Database

How Teradata is different from other database?

- Parallelism : Massive Parallel Processing(MPP) Architecture. It does parallel processing of data massive data.

- Shared Nothing : If anyone one node has failed then other node has backup and can we make use of it. It's called fault tolerance.

- Linear Scalability: When we need extra space we can go for it by scaling up to 4096 node.

Different utilities :

---------------------

* Import/Export data from and to Teradata

* Fast load : If you want to load data from flat file to teradata fast load utility is used.

* Multi load : If you want to load multiple tables at a time we use Multi load.

* Tpump : Used to load data into teradata.

* Fast export : Used for exporting data from database.

Teradata Architecture :

-----------------------

Teradata is used for massive parllel processing(MPP). There are 4 different components of teradata i,e Node, Parsing engine, Message passing unit or Bynet, Access module processor(AMP).

Parsing engine : It like gate keeper.

- Whenever you want to store and fetch data from database its checks your authentication.

- It converts you sql query into machine readable language and parse the query.

- It will store query in access path and it whenever you are using same query.

Bynet : It means BanYan NETwork.

- BYNET is the communication highway inside Teradata that makes parallel processing possible.

- Teradata has many AMPs (workers) that process data in parallel. These AMPs and other parts (PEs, nodes) need a way to talk to each other.

That’s what BYNET is — the network inside Teradata.

- It delivers messages (like SQL requests and results) between the PEs (who receive your query) and the AMPs (who store and process the data).

- There are always two BYNETs (BYNET 0 and BYNET 1).If one goes down, the other takes over — so your queries don’t stop.

Amp : It's called access module processor. (virtual disk)

- An AMP is not hardware. It’s a software process running inside a Teradata node.

- Each AMP is tied to its own slice of disk storage. Teradata may have hundreds of AMPs → that’s how it divides and conquers big data.

- It is a logical process (not a physical machine) that Stores a portion of the table rows (on disk), Processes queries on that portion (like filtering,

joining, aggregating) and Returns results to the Parsing Engine (through BYNET).

Disk Storage :

- Disk Storage in Teradata is just the place where your actual data (tables, rows, indexes) is physically kept.

- Every AMP has its own disk space. So when data is loaded, Teradata spreads rows across different AMPs → which means across different disks.

- When you query, each AMP goes to its own disk storage, pulls the data, and processes it

Primary Index :

---------------

- There should be one Primary index on table.

- There 2 indexes. It is unique primary index or non-unique primary index

- Primary index can be null value

- Primary index can be modified but populated table cannot be modified.

- Primary index has limit of 64 coulumns combinations.

INDEXES :

---------

- unique primary index :

- unique Secondary index :

- Partitioned primary index : It is used for distribution of rows based on different partitions so that data retrival will be faster.

RAID : (Redundant Array of Independent Disks)

------

- Data protection failure happens on different levels Disk level, Node level, Amp level, Transaction level and Object level.

- We have raid 1 and raid 5 for disk mirroring if the case of any failure happens.

DIFFERENT SPACES :

------------------

Permanent space

- It's physical space

- Objects like tables and users are stored here.

Spool space :

- Used for immediate calculations, subqueries and joins.

Temporary Space :

- They are stored only during the duration of the query.

=================================================================================================================================================

Teradata supports different types of tables :-

---------------------------------------------

Permanent table : This is a default table, it contains inserted data and stored data permanently.

Volatile table : The data inserted into a volatile table is retained only during the user session. Table and data is dropped at the end of the session. These tables are used to hold the data during data transmission.

Global temporary table : Table is delete at the end of user session.

Derived table : Derived table holds the intermediate results in a query. Their lifetime is within the query in which they are created, used and dropped.

Set Versus Multiset :-

---------------------

Teradata classifies the tables as SET or MULTISET tables based on how the duplicate records are handled. A table defined as SET table doesn’t store the duplicate records, whereas the MULTISET table can store duplicate records.

Tuesday, November 4, 2025

SQL AND ORACLE PROCEDURE, TRIGGER AND FUNCTIONS DIFFERENCE

PROCEDURE:
===========

SQL:
------
CREATE OR ALTER PROCEDURE PROC_AVG_EMP_SAL (@IN_DEPTNO INT, @O_AVG_SAL INT)
AS
BEGIN
SELECT @O_AVG_SAL = AVG(SAL) FROM EMP WHERE DEPTNO=@IN_DEPTNO GROUP BY DEPTNO;
END;

ORACLE:
-------------
CREATE OR REPLACE PROCEDURE PROC_AVG_EMP_SAL (IN_DEPTNO IN INT, O_AVG_SAL OUT INT)
AS
BEGIN
SELECT AVG(SAL) INTO O_AVG_SAL FROM EMP WHERE DEPTNO=IN_DEPTNO GROUP BY DEPTNO;
END;
/

Note:
1) There is a difference in defining variable between SQL and Oracle.
2) There is a difference in assigning aggregations to variable between SQL and Oracle.

FUNCTION:-
=========

SQL:
------
CREATE OR ALTER FUNCTION FUN_AVG_SAL (@IN_DEPTNO INT)
RETURNS INT
AS
BEGIN
DECLARE @O_AVG_SAL INT;
SELECT @O_AVG_SAL = AVG(SAL)
FROM EMP
WHERE DEPTNO = @IN_DEPTNO
GROUP BY DEPTNO;
RETURN @O_AVG_SAL;
END;
GO

ORACLE:
-------------
CREATE OR REPLACE FUNCTION FUN_AVG_SAL (IN_DEPTNO IN NUMBER)
RETURN NUMBER
IS
O_AVG_SAL NUMBER;
BEGIN
SELECT AVG(SAL)
INTO O_AVG_SAL
FROM EMP
WHERE DEPTNO = IN_DEPTNO
GROUP BY DEPTNO;
RETURN O_AVG_SAL;
END;
/

NOTE:
1) SQL doesn't used 'IS' it used 'AS'

TRIGGER:
========

SQL:
-------
CREATE OR ALTER TRIGGER INSERT_TRIGGER
ON EMP
AFTER INSERT
AS
BEGIN
INSERT INTO TRIGGER_TABLE (EMPNO, ENAME, SAL, DEPTNO)
SELECT EMPNO, ENAME, SAL, DEPTNO FROM inserted;
END;
GO

ORACLE:
-------------
CREATE OR REPLACE TRIGGER INSERT_TRIGGER
BEFORE INSERT ON EMP
FOR EACH ROW
BEGIN
INSERT INTO TRIGGER_TABLE (EMPNO,ENAME,SAL,DEPTNO)
VALUES(:OLD.EMPNO,:OLD.ENAME,:OLD.SAL)
END;
/

NOTE:
1) SQL Server does not support BEFORE triggers

Data Modeling

- Data modeling is a detailed process that involves creating a visual representation of data and its relationships. It serves as a blueprint for how data is structured, stored, and accessed to ensure consistency and clarity in data management

- A well-designed model helps prevent common issues like redundancy, performance bottlenecks, and difficulty adapting to future changes.

Types of Data Models:
---------------------------
- Conceptual Data Model: It defines key business entities and relationships without getting into technical details.

- Logic Data Model: It defines how the data is structured. It includes data's attributes, relationships, and constraints and also a bridge between business requirements and the physical implementation of the data.

- Physical Data Model: It defines how the data is stored in a database like specific table structures, indexes, and storage mechanisms

Data Modeling Techniques:
---------------------------------
Data modeling is not a one-size-fits-all process. Different techniques are employed depending on the complexity of the data and the goals. There 4 different data modeling techniques

* Entity-relationship (ER) modeling
- Entities
- Relationships
- Attributes

* Dimensional modeling : Dimensional modeling is widely used in data warehousing and analytics, where data is often represented in terms of facts and dimensions.
- Fact table
- Dimension tables

* Object-oriented modeling: Object-oriented modeling is used to represent complex systems, where data and the functions that operate on it are encapsulated as objects.
- This approach is particularly beneficial in object-oriented programming (OOP) languages like Java and Python, where data models can be directly mapped to classes and objects.

* NoSQL and document-based modeling: NoSQL and document-based modeling techniques are designed for flexible, schema-less databases.
- In NoSQL databases like MongoDB, a document-based model organizes data into collections of documents, where each document can have a unique structure.
- This flexibility allows for faster iteration and scaling, particularly in big data environments or applications requiring high-speed data access.

Matching data models with data modeling techniques:
-----------------------------------------------------------------

* Conceptual data model → Entity-relationship (ER) modeling
* Logical data model → ER, dimensional, and object-oriented modeling
* Physical data model → Dimensional, object-oriented, and NoSQL modeling

Normalize early, denormalize when necessary
-------------------------------------------------------

- Normalization is a foundational concept in data modeling. It organizes data so that redundancy is minimized and consistency is ensured by structuring it into smaller, logically related tables.
- This process reduces the chances of anomalies and discrepancies, making the data easier to manage and update.

Future-proofing your data model :
-----------------------------------------
As businesses evolve, so do their data requirements. Designing a future-proof data model means creating one that’s flexible and scalable, ready to handle new data sources and changing demands.

Imagine you’re designing a database for an online store. Initially, you track only domestic orders, but later, you expand globally. You'll need major modifications later if your schema only supports a single currency.

Instead of a simple Price column, include Currency_Code and Exchange_Rate fields.Use a flexible schema that allows new attributes without restructuring the database.

=================== Normalization in SQL (1NF - 5NF) ============================

Database normalization is an important process used to organize and structure relational databases. This process ensures that data is stored in a way that minimizes redundancy, simplifies querying, and improves data integrity.

In simpler terms, it involves breaking down a large, complex table into smaller and simpler tables while maintaining data relationships. Normalization is commonly used when dealing with large datasets.

If a table is not properly normalized and has data redundancy, it will not only take up extra data storage space but also make it difficult to handle and update the database.

First Normal Form (1NF):

Basic Informatica Tutorial

Labels

Tuesday, November 11, 2025

Teradata Database

Tuesday, November 4, 2025

SQL AND ORACLE PROCEDURE, TRIGGER AND FUNCTIONS DIFFERENCE

Data Modeling

Teradata Database

Report Abuse

Labels