Basic Informatica Tutorial

Thursday, July 24, 2025

Oracle and SQL Server stored procedures differences

There are syntactical and functional differences between Oracle and SQL Server when it comes to writing stored procedures.

Both use different procedural languages:
Oracle uses PL/SQL (Procedural Language/SQL)
SQL Server uses T-SQL (Transact-SQL)

Here's a comparison highlighting key syntactical differences:

1. Procedure Declaration

Oracle:
-------

CREATE OR REPLACE PROCEDURE proc_name (
p_id IN NUMBER,
p_name OUT VARCHAR2
)
AS
BEGIN
-- PL/SQL block
END;

SQL Server:
-----------
CREATE PROCEDURE proc_name
@id INT,
@name NVARCHAR(100) OUTPUT
AS
BEGIN
-- T-SQL block
END;

2. Variable Declaration

Oracle:
-------
DECLARE
v_total NUMBER;

SQL Server:
----------- DECLARE @total INT;

3. Calling a Stored Procedure with Output Parameters

Oracle:
-------
DECLARE
v_name VARCHAR2(100);
BEGIN
proc_name(101, v_name);
DBMS_OUTPUT.PUT_LINE(v_name);
END;

SQL Server:
----------- DECLARE @name NVARCHAR(100);
EXEC proc_name @id = 101, @name = @name OUTPUT;
PRINT @name;

4. Exception Handling

Oracle:
-------
BEGIN
-- statements
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE('Error');
END;

SQL Server:
-----------
BEGIN TRY
-- statements
END TRY
BEGIN CATCH
PRINT 'Error';
END CATCH

STORED PROCEDURES: ----------------------------------

* Write a program to get department-wise salaries.

SQL:

Create or Alter procedure proc_depwise_salaries
@in_deptno int
AS
Begin
Select sal from emp where deptno=@in_deptno;
end;

Execute proc_depwise_salaries @in_deptno=10

ORACLE:

Create or replace Procedure Get_deptwises_sal (
input_deptno IN NUMBER )
AS
BEGIN
select * from emp where deptno=in_deptno;
end;

BEGIN
Get_deptwises_sal(10);
END;
/

* Write a program to get max salaries department wise

SQL:
----
Create or Alter procedure proc_max_salaries_depwise
@deptno int,
@maxsal int
AS
Begin
Select @maxsal = max(sal) from emp where @deptno=deptno group by deptno;
end;

DECLARE @result INT;
EXEC proc_max_salaries_depwise @deptno=10, @maxsal =@result

ORACLE:
-------------
CREATE OR REPLACE PROCEDURE proc_max_salaries_depwise
( p_deptno IN NUMBER,
p_maxsal OUT NUMBER )
AS
BEGIN
SELECT MAX(sal) INTO p_maxsal FROM emp WHERE deptno = p_deptno;
END;
/

DECLARE
v_maxsal NUMBER;

BEGIN
proc_max_salaries_depwise(10, v_maxsal);
DBMS_OUTPUT.PUT_LINE('Max salary = ' || v_maxsal);
END;
/

Declaring Parameters in Stored Procedures:
-----------------------------------------------------
When you define input and output parameters for a stored procedure in SQL Server, you do not use the DECLARE keyword.

Example:
------------
CREATE PROCEDURE GetDeptSalaryStats
@inputDeptNo INT, -- Input parameter
@totalSalary DECIMAL(18,2) OUTPUT, -- Output parameter
@averageSalary DECIMAL(18,2) OUTPUT -- Output parameter
AS
BEGIN
-- No need to DECLARE again, these are already declared as parameters
SELECT
@totalSalary = SUM(sal),
@averageSalary = AVG(sal)
FROM emp
WHERE deptno = @inputDeptNo;
END;

Note:
-------
* @inputDeptNo, @totalSalary, and @averageSalary are parameters, not local variables.
* When parameters are declared in the procedure header, DECLARE is not used.
* You can use them directly inside the body of the procedure.

You use DECLARE only inside the procedure body or script to declare local variables, like this:
DECLARE @x INT = 10;
This is not a parameter but a local variable, usable only in that block or batch.

| Usage Context | Syntax Example | Uses DECLARE? | Scope

| ------------------------- | ----------------------------------- | ------------- | -----------------

| Procedure parameters | @deptno INT, @totSal DECIMAL OUTPUT | No | Across procedure

| Local variable | DECLARE @x INT = 5; | Yes | Inside procedure

| Ad-hoc script variable | DECLARE @deptno INT = 10; | Yes | In script/session

Sunday, March 9, 2025

Partitioning

Hive organizes the tables into partitions, and partitions are subdivided into buckets.

ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec
[LOCATION 'location1'] partition_spec [LOCATION 'location2'] ...;
partition_spec:
: (p_column = p_col_value, p_column = p_col_value, ...)

ALTER TABLE employee
ADD PARTITION (year=’2012’)
location '/2012/part2012';

ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;

ALTER TABLE employee PARTITION (year=’1203’)

RENAME TO PARTITION (Yoj=’1203’);

ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec, PARTITION partition_spec,...;

ALTER TABLE employee DROP [IF EXISTS]

PARTITION (year=’1203’);

Create, Alter, Drop Databases and Tables

CREATE DATABASE [IF NOT EXISTS] userdb;

DROP DATABASE IF EXISTS userdb;

IF NOT EXISTS : This is a optional clause we are using to remove the table if it exist.

SHOW DATABASE : This command is used to list the databases.

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name

[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]

(OR)

CREATE TABLE IF NOT EXISTS employee (
eid int,
name String,
salary String,
destination String )

COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;

Brief explanation of the above syntax:
DELIMITED is a Row Format
‘\t’ is a Field terminator
‘\n’ is a Lines terminator
TEXTFILE is a File type.

Generally, we use to insert data into sql and oracle databases using INSERT statments but here we insert data into hive using LOAD DATA statement.

LOAD DATA statement is used to store bulk records.

We have two options to load data
- Local file system
- Hadoop file system

SYNTAX:
-------------
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename

[PARTITION (partcol1=val1, partcol2=val2 ...)]

QUERY:
-----------
LOAD DATA LOCAL INPATH '/home/user/sample.txt'

OVERWRITE INTO TABLE employee;

PARTITION is optional to use in the query.

ALTER TABLE name RENAME TO new_name

ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])

ALTER TABLE name DROP [COLUMN] column_name

ALTER TABLE name CHANGE column_name new_name new_type

ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])

DROP TABLE IF EXISTS employee;

Data Types

Data types in Hadoop are classified into 4 types
- Column Types
- Literals
- Null Values
- Complex Types

When the data range exceeds the range of INT, you need to use BIGINT and if the data range is smaller than the INT, you use SMALLINT. TINYINT is smaller than SMALLINT.

String type data types can be specified using single quotes (' ') or double quotes (" "). It contains two data types: VARCHAR and CHAR

Timestamp (“YYYY-MM-DD HH:MM:SS.fffffffff”)

Dates (YYYY-MM-DD)

Decimals (10,0)

Union Types : It is a collection of heterogeneous data types.

Null Value : Missing values are referred here as null values.

Arrays : These are used just like in java.

Maps : These are similar to java maps

Structs : Here complex data is used with comments.

Hadoop/Hive

The term bigdata is used to refer large volumes of data. Using the traditional way to handle this data is very difficult. So, the Apache software has created a framework called hadoop to handle these problems.

Hadoop is the open-source framework to stored and process bigdata. It contains two modules Hadoop Distributed file system (HDFS) and MapReduce

MapReduce: It is used to process the large volumes or structured, semi-structured and unstructured data.

HDFS : It is used to store the data.

There are certain tools like Sqoop, Pig, and Hive helps hadoop modules.

Hive is not a Relational data base. It is query engine.

Import Command is used to import data from DBMS to hive and Exports command is used to export data into specific location.

Monday, January 27, 2025

Window Functions in SQL

Window functions perform calculations across a set of rows related to the current row without collapsing the rows into a single result. Unlike aggregate functions, window functions retain the row-level details while adding aggregated or ranked data as additional columns.

Window functions and analytical functions are the same.

Key Components of Window Functions:

- OVER() Clause: Defines the window of rows to perform calculations on.
- PARTITION BY: Divides rows into groups or partitions for calculation.
- ORDER BY: Specifies the order of rows within the window.

Questions and Answers:
------------------------------

* How can I calculate a running total of sales for each customer?

SELECT
CustomerID,
OrderDate,
Amount,
SUM(Amount) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS RunningTotal
FROM Sales;

- `PARTITION BY CustomerID`: Groups rows by each customer.
- `ORDER BY OrderDate`: Orders rows by date within each group.
- `SUM(Amount)`: Adds up the sales amounts progressively.

* How can I rank employees based on their salary within each department?

SELECT
DepartmentID,
EmployeeID,
Salary,
RANK() OVER (PARTITION BY DepartmentID ORDER BY Salary DESC) AS Rank
FROM Employees;

- `RANK()`: Assigns a rank to each employee in descending salary order.
- `PARTITION BY DepartmentID`: Ensures the ranking is specific to each department.

* How can I calculate the percentile rank of students based on their scores?

SELECT
StudentID,
Score,
PERCENT_RANK() OVER (ORDER BY Score DESC) AS Percentile
FROM Students;

- `PERCENT_RANK()`: Calculates the relative standing of each score in the list.
- No `PARTITION BY` clause means the calculation applies to all rows.

* How can I find the sales amount of the previous order for each customer?

SELECT
CustomerID,
OrderID,
OrderDate,
Amount,
LAG(Amount) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS PreviousOrderAmount
FROM Sales;

- `LAG(Amount)`: Returns the value of the previous row in the window.
- `PARTITION BY CustomerID`: Ensures the calculation is specific to each customer.

* How can I calculate the 3-month moving average of sales for each product?

SELECT
ProductID,
OrderDate,
Amount,
AVG(Amount) OVER (PARTITION BY ProductID ORDER BY OrderDate ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS MovingAvg
FROM Sales;

- `ROWS BETWEEN 2 PRECEDING AND CURRENT ROW`: Defines a sliding window of 3 rows.
- `AVG(Amount)`: Calculates the average over this window.

These examples demonstrate how window functions can provide powerful tools for advanced analytics while preserving row-level details.

Friday, December 13, 2024

List of taskflows in IICS

ICS provides a list of tasks like linear taskflow, taskflow, parallel tasks, parallel tasks with decision, sequential tasks, sequential tasks with decisions, and single task.

SEQUENTIAL TASK: This task is used to run the different tasks sequentially

A sequential task is used in taskflows to define a series of tasks that need to be executed one after the other in a specific order. The main purpose of a sequential task is to manage and control the flow of tasks in a task flow, ensuring that each task completes before the next one begins.

PARALLEL TASK: This task is used to run the different tasks parallely.

The parallel task is used in taskflows to execute multiple tasks simultaneously rather than sequentially. It is designed for scenarios where tasks are independent of each other and can run in parallel to optimize processing time.

TASKFLOW TASK: Taskflow is used to automate and manage the flow of tasks in a specific sequence. It helps connect and control multiple tasks to complete a process efficiently.