The different types of assets in IICS are
- Tasks
- Data ingestion and Replication
- Mappings
- Mapplets
- Taskflows
- Components
The different types of assets in IICS are
- Tasks
- Data ingestion and Replication
- Mappings
- Mapplets
- Taskflows
- Components
** What is Runtime Environment in IICS?
In Informatica Intelligent Cloud Services (IICS), a Runtime Environment is the execution engine that runs your tasks and data integration job. 
It defines where and how the integration jobs (mappings, synchronization tasks, replication tasks, etc.) will run.
 There are mainly two types:
  1. Cloud Hosted Runtime (IICS Cloud) => Informatica Cloud Hosted Agent
    * Managed completely by Informatica.
    * Best suited for cloud-to-cloud data integrations (e.g., Salesforce → Snowflake).
    * You don’t need to install anything locally.
  2. Secure Agent Runtime (On-Premise)	=> DESKTOP-XXXX
    * A lightweight program installed on your own server, VM, or machine.
    * Required when integrating on-premise systems (like SQL Server, Oracle, SAP) with cloud or other
       on-premise targets.
    * The Secure Agent connects securely with IICS and executes the jobs within your network.
 In simple terms:
  The runtime environment is the engine that processes your integration jobs. You select the appropriate environment (Cloud or Secure Agent) depending on whether  your data is in the cloud or on-premises.
  Runtime Environment ≈ Integration Service (conceptually)
** What are the different tasks in IICS?
   There are 7 different tasks available in IICS.
 1. Mapping Task (MTT)
   * It Runs a mapping that you build using the Mapping Designer.
   * Allows for complex data transformations (filter, join, expression, lookups, aggregations, etc.).
 2. Synchronization Task
   * Synchronization Task is one of the most commonly used tasks for simple data integration between a
      source and a target. 
    * Can perform Insert, Update, Upsert, or Delete operations.
   * Can perform Incremental loads like filtering records (e.g., “load only records updated today”).
        Example: A retail company wants to sync customer details from Salesforce to a SQL Server table 
                        every night.  
	           Synchronization Task (configured to Upsert records so new customers are added and                                existing ones are updated).
 3. Masking Task
   * Masking Task is used to protect sensitive or confidential data by replacing it with realistic but fictional values. This ensures that sensitive data can be
     safely used for testing, development, or analytics without exposing the real information.
   * Hides sensitive fields (like credit card numbers, SSNs, email IDs, phone numbers, salaries, etc.).
 4. Replication Task
   * Replication Task is used to copy or replicate data from a source to a target with minimal setup, usually for database or cloud application replication.
   * Moves entire tables or objects from source to target.
   * It has 2 Modes. Full load and Incremental load.
     	 Example: A company wants to replicate its Salesforce Accounts table into Snowflake every day.
			  Replication Task (incremental load so only new/updated accounts are replicated daily).
 5. PowerCenter Task
   * PowerCenter Task is used when you want to run and manage existing PowerCenter workflows from IICS, instead of rebuilding them in the cloud.
   * It connects IICS to your on-premise Informatica PowerCenter environment and lsets you trigger PowerCenter workflows directly from IICS.
   * If you already have hundreds of mappings and workflows in PowerCenter, you don’t need to rebuild them all in IICS immediately. Instead, you can  run them through IICS using a PowerCenter Task.  
    6. Dynamic Mapping Task
   * Instead of creating separate mappings for each source/target, you design one mapping and make it dynamic.   
      Example: A company gets daily sales files from 10 regions with the same structure but different names. Instead of creating 10 mappings, you build one dynamic  
			mapping task:
			Source file name = parameter
			Target table = parameter
			Transformation rules stay the same
			At runtime, DMT picks the right file and loads into the right target. 
   7. Data Transfer Task
   * Data Transfer Task (DTT) is a simple task used to move files from one location to another without doing any transformation.
   * A Data Transfer Task in IICS is used purely for file movement/transfer between systems, without performing any data transformation.
      Example: Suppose your business receives daily CSV files via SFTP from a partner.
			Use a Data Transfer Task to move the files from SFTP → Amazon S3.
			Run a Mapping Task to process and load them into Snowflake.
** What are the different taskflows available in IICS?
   There are 7 different taskflow available in IICS.
    1.	Linear Taskflow
   * The simplest type of taskflow (legacy style).
   * Runs tasks one after another in a straight sequence.
   * No branching or parallelism.  
Example: Task A → Task B → Task C
    2. Taskflow
   * The modern, flexible version of taskflows in IICS.
   * Supports sequential and parallel execution, decisions (IF/ELSE), loops, and parameter passing.
Example: You can say, “Run Task A and Task B in parallel → If Task B succeeds, run Task C, else stop.”
    3. Parallel Tasks
   * Executes multiple tasks at the same time (in parallel).
   * Saves time when tasks don’t depend on each other.
Example: Run Task A, Task B, and Task C simultaneously, then move to Task D once all complete.
  4.	Parallel Tasks with Decision
   * Similar to Parallel Tasks, but adds a decision point (IF/ELSE condition) after execution.
   * The next step depends on the success, failure, or data values of the parallel tasks.
     Example: Run Task A and Task B in parallel → If both succeed, run Task C; if any fails, run Task D.   
 5. Sequential Tasks
   * Runs tasks one after another in a defined order.
   * Ensures strict dependency control.
  6. Sequential Tasks with Decision
   * Same as Sequential Tasks, but introduces conditional branching (decision step).
   * Lets you choose different paths depending on outcomes.
Example: Run Task A → If success, run Task B → else run Task C.
    7. Single Task 
   * Simplest form of taskflow with just one task.
   * Useful when you only need scheduling or monitoring for a single task.
Example: Run just one Mapping Task daily at 10 PM
** What is Data Ingestion and Replication?
   Data Ingestion: 
   - It’s the process of bringing raw data from various sources into a system (like a database, data lake, or data warehouse).   
   - It’s about collecting and moving data, often in bulk or streaming.
   - Databases, files, APIs, message queues (Kafka, JMS), logs, IoT devices, etc.
   - Types of ingestion are Batch ingestion and Real-time ingestion
   - Batch ingestion: large sets of data at scheduled intervals.
   - Real-time ingestion – continuous data flow (like Kafka streaming).
      Replication Task:
   - The process of copying data from one system to another, usually table-to-table or object-to-object.
   - it’s about keeping a copy of data in sync between two systems.
   - It has 2 Modes. Full load and Incremental load.
     	 Example: A company wants to replicate its Salesforce Accounts table into Snowflake every day.
			 Replication Task (incremental load so only new/updated accounts are replicated daily).
** What are the different components available in IICS Assests?
 - Fixed-Width File Format		: Configure reusable formats for fixed-width flat files to use in mappings and mapping tasks.
 - Intelligent Structure Model	: After you create an intelligent structure model, you can associate it with a Structure Parser transformation and use the transformation in a mapping.  
 - User-Defined Function		: Create a reusable function to use in transformation expressions.
 - Hierarchical Schema 			: Upload an XML schema or an XML or JSON sample file to use with a Hierarchy transformation.
** Does synchronization task has transformations in IICS?
- Yes, in IICS a synchronization task can include transformations to modify, filter, or map data before loading it into the target system.  
- Transformations in a Synchronization Task are Filter, Expression, Lookup and Datamasking 
** Does data injection task has transformations in IICS?
- Yes, in IICS a Data Injection task can include transformations, depending on the use case and the type of task being executed. 
- If you're using a Mapping Task or Data Synchronization Task, you can apply transformations such as Expression, Aggregator, Router, Lookup, and Filter to  manipulate the data before injecting it into the target. 
There are mainly two types:
- If you're working with Cloud Application Integration (CAI), transformations can be applied using Process Designer or Service Connectors. 
** How to work on multiple files as source with indirect meathod?
- Filelist option
** How to work on multiple files as source if the any of the files is missing and job should be successful
- Command task and put batch file.
** How to handle multi-Database source and independent target table?
- Select connection system(Oracle)
- Select source type option "Multiple Objects" and then 
- Add Source Objects from the side right and select Emp table from the tables list.
- Click on Emp Object and Add related Objects to add Dept relational table. 
** How to handle Single Database source and multiple independent target tables?
 OR
  Break single table into multiple tables?
- Select connection system(Oracle)
- Select source type option "Single Object"
- Select Objects from the side right and select Emp table from the tables list.
- Select connection IICS_TGT21(Oracle)
- Select Target Type "Single Object" & Repeat it for all 3 target as per the requirement.
- Select Object and give target name & Repeat it for all 3 target as per the requirement.
** How to Create Mapping using Source Query?
- Select connection system(Oracle)
- Select source type option "Query"
- Click on Edit Query option and type your query in it & validate.
- Select connection IICS_TGT21(Oracle)
- Select Target Type "Single Object"
- Select Object and give target name
** How to Create Mapping with Dynamic file created every time when the job is executed.
- Select connection C_TGTFILE(Flat File)
- Select Target Type "Single Object"
- Select Object and give target name as 'CUSTOMER_ID_'||TO_CHAR(SYSDATE,'YYYYMMDDHH24MISS')||'.TXT'
** How to create Flow Run Order
- From the top right corner click 3 dots and select 'Flow Run Order'
** How to Implement Fixed Width File Format?
- Select New & from components create 'Fixed-Width-File Format'
- Select your connection and add sample Object.
- Edit column boundaries and give name to the columns.
- Select connection C_SRCFILE(Oracle)
- Select Source Type "Single Object"
- Select 'Emp' Object.
- Select Formatting Options 'Fixed Width'.
** How to create ISM Component for XML File?
- Select New & from components create 'Intellegent Structure Model'
- Give a name for Intelligent Structure Model and choose XML file.
** How to create ISM Component for JSON File?
- Select New & from components create 'Intellegent Structure Model'
- Give a name for Intelligent Structure Model and choose JSON file.
** How do you implement data load from multiple database source to multiple database targets in a single mapping with iics?
- To implement data load from multiple database sources to multiple database targets in a single mapping with Informatica Cloud (IICS)
  Create a parameterized mapping: Parameterize the source and target transformations.
  Create a Dynamic Mapping Task: Within the Dynamic Mapping Task, define multiple jobs with different sources and targets. 
** What is the latest mapping that you have used in iics?
** What type of join you use in expression for scd type2 scenario?
- Full Outer join type
* How do you implement below scenarios in iics just like scd 2
* If no record in target, insert should happen and flag column should be 'insert'
* If record difference is there between source and target, insert should happen and flag column should be 'insert'
* If no record difference is there between source and target, no insert should happen and flag column should be 'no record inserted'
* If record is matching between source and target, record should be deleted in target.
** What is the difference between Powercenter and IICS?
- Powercenter is a traditional, on-premise data integration tool where as IICS a cloud-based data integration platform (SaaS).
- IICS can be accessed via the internet, while PowerCenter requires client applications to be installed on the server.
- IICS can integrate and synchronize data and applications across cloud and on-premise environments. PowerCenter can only access on-premise data sets.
- IICS provides a process developer tool that allows developers to create connectors to access third-party applications. PowerCenter does not have a similar tool.
- IICS is built on a microservices-based architecture, while PowerCenter uses a client-server architecture.
** How to use stored procedures in IICS? How many ways are there?
- We don't have any specific stored procedure transformation in IICS. Use SQL Transformation & select SQL Type as Stored procedure and procedure next.
** How do you create loops in IICS?
** How do you send email in IICS?
** How do you pass parameters to stored procedures in IICS?
** How do you use views in IICS?
** What is the toughest scenario you faced in IICS?
** How to implement transaction control in IICS?
** How to implement synchronization task?
** How to implement replication task?
- Add source details & include related objects
- Add target connection details
- Add any field exclusions
- Add any data filters
** How many partitions are available in IICS?
- Fixed partition
- Pass through
- Key range
** Do you have access to metadata in IICS?
** How do you import power center jobs in iics?
** What the difference between Integration service & Repository service?
- Integration service is responsible to run the execute the jobs
- Repository service saves metadata in the repository.
** Informatica cloud architecture
** What are the list of services available in IICS?
Cloud Data Integration (CDI) and Cloud Application Integration (CAI) are products in Informatica Intelligent Cloud Services (IICS) that are used for data integration and application integration
** What the difference between input and in-out parameter?
- Input Parameter: A parameter that only passes values into the mapping or task at runtime
- In-Out Parameter: A parameter that can pass values into the mapping and also return values
Snowflake is the first clould based platform built it as dataware as services (daas). It can run of cloud providers like AWS, GCP and AZURE.
It doesn't require hardware and software to install and configure. It runs completely on public cloud infrastructure.
SNOWFLAKE ARCHITECTURE :
------------------------
* Data storage layer :
- Data is not stored internally. It's stored using external cloud provider upon our selection while create the snowflake account.
- Data stored in columns and compressed into blobs
* Query processing/Compute :
- It's called the muscle of the system.
- It's the layer where the queries are processed.
- It's made up of virtual warehouses. It executes tasks required for data processing.
- This layer processes MPP (Massive Parallel Processing)
* Cloud services :
- It's called the brain of the system.
- Managing infrastructure, access control, s ecurity optimizer and metadata are done here.
CREATING A WAREHOUSE :
----------------------
CREATE OR REPLACE WAREHOUSE FIRST_WAREHOUSE
WITH
WAREHOUSE_SIZE = XSMALL
AUTO_SUSPEND = 300
SCALING_POLICY = 'Economy' (To maximize cost efficiency by scaling only when necessary)
MIN_CLUSTER_COUNT = 1
MAX_CLUSTER_COUNT =3
INITIALLY_SUSPENDED = TRUE
COMMENT = "This is our first warehouse"
LOADING DATA :
--------------
* BULK LOADING:
- It uses our own databases and warehouses
- It uses copy command
- Transformations
* CONTINIOUS LOADING:
- Small volumes of data
- Serverless
STAGES :
--------
It's the storage area used for data processing during the extract, transform and load process. There are 2 types Internal and External.
* EXTERNAL STORAGE
- External cloud provider
- S3 bucket
- Google cloud platform
- Microsoft Azure
- Database Object created in schema
* INTERNAL STORAGE
- Table stage => They are automatically defined, No setup needed and accessed only by table owner.
- LIST @~ => This command is used to display table stage.
- User stage => They are automatically defined, No setup needed and accessed only by user.
- LIST @%table => This command is used to display user stage.
- Named Internal stage
- LIST @table => This command is used to display Named stage.
- Local storage is maintained by snowflake
- Select $1, $2, $3 from stage_table is used to access the table
COPY COMMAND :
--------------
* Copy into <table_name> from <stage_name>
* Parameters:
- ON_ERROR = {CONTINUE/SKIP_FILE}
- VALIDATTION_MODE = RETURN_ROWS
(OR)
COPY INTO OUR_FIRST_DB.PUBLIC.ORDERS
FROM @aws_stage
file_format = (type = csv field_delimiter=',' skip_header=1);
PERFORMANCE TUNING :
--------------------
Most of the performance tuning is done by snowflake automatically.
* Dedicated warehouses
- To DBA, Data Scientists and Analysts
* Scaling up
- Increasing warehouses size
* Scaling down
- Adding warehouses/clusters
* Caching
- Automated process when query is executed
* Clustering
- Its automatically maintained by snowflake
* What is a Snowpipe?
- Snowpipe is Snowflake's continuous, fully managed data ingestion service.
- It automatically loads data into Snowflake from cloud storage services like Amazon S3, Azure Blob Storage, and Google Cloud Storage
Step of Setting up snowpipe :
-----------------------------
- Create Stage
- Copy Command
- Create Pipe
- S3 Notification
* What is Time travel?
- If the data or database is deleted or dropped then historical data is queried using time travel.
- Restored items like Query deleted, updated data, restored tables, schemas, databases, clones of tables.
eg: Select * from table AT (TIMESTAMP => timestamp)
Select * from table AT (OFFSET => -10*60)
Select * from table BEFORE (STATEMENT => query_id)
Undrop table table_name
Undrop schema schema_name
Undrop Database database_name
- Standard edition can travel back 1 day to the past whereas Enterprise, Business Critical and Virtual Private edition can travel back 90 days to the past.
It's also called as retention period.
- Data loss can happen when using time travel when you execute it 2nd time.
* What is fail safe?
- It is used to recover historical data when there is any disaster.
- It's not configurable, 7 day period for permanent tables.
- This state starts when the time travel ends.
- We need to reach snowflake to recoved it as we can't do anything on it as users.
DIFFERENT TYPES OF TABLES :
---------------------------
- They are created using create table command.
- Time travel retention period exists here.
- Fail safe exists here.
- They are created using create transient table command.
- Time travel retention period exists here.
- They are created using create temporary table command.
- Time travel retention period exists here.
- They only exists in current session.
External Tables, Hybrid Tables, and Iceberg Tables
CLONING TABLES :
----------------
* Zero copy cloning : It means data will not be copied in this cloning.
- It's used to clone datbases, schemas, tables, stage, pipe, stream and tasks.
- All the child objects will be clone but not internal stages.
- It's an independent table and will reference original table.
- It will not be effected when original tables is modified.
- Select previlage is enough for Tables
- Owner previlages are required for pipe, stream and tasks.
- Usage previlages for other objects
- Load history metadata is not copied.
- Cloning for specific point in time is possible.
- It's not possible to create temporary table into permanent table.
eg: Create table tablename clone table_source
SWAPPING TABLES :
-----------------
- It is used to take development code into production code.
- Metadata swapping is used here.
- It can be done on tables and schemas.
eg: Alter table table_name swap with target_tablename
DATA SHARING :
--------------
- Here provider Account_1 is accessed by consumer Account_2 and consumer can copy the data.
- Provider stores data with his storage Account_1 and
eg: GRANT USAGE ON DATABASE DATA_S TO SHARE ORDERS_SHARE;
GRANT USAGE ON SCHEMA DATA_S.PUBLIC TO SHARE ORDERS_SHARE;
GRANT SELECT ON TABLE DATA_S.PUBLIC.ORDERS TO SHARE ORDERS_SHARE;
- It can be shared to non-snowflake users with accountadmin account.
DATA SAMPLING :
---------------
- Instead of taking large amount data only some of the sample data is taken in data sampling.
- It is useful for query development or data analysis
- There are 2 meathods of sampling Row or Bernouli and Block or System meathods.
- 50% of data rows are selected in Row or Bernouli meathods
- 50% of data block are selected in Block or System meathods.
STREAMS :
---------
- Steam is an object that records or caputures DML changes made to the table.
- This process of caputure the changes to the data is called change data capture (CDC)
- It is created on the top of any specific table.
- It stores metadata but not the data of the tables on which it's created.
- Once the stream operation is completed, we consume steam object and data will be removed from it.
- When streams are created on a table, it will add two columns to it METADATA$ACTION and METADATA$UPDATE.
- METADATA$ACTION column will be updated with INSERT, DELETE and METADATA$UPDATE column will be updated with TRUE
- List of steams created on table are Insert, Update, Delete and Offset
eg: Create stream <steam_name> on table <table_name>
Select * from <steam_name>
Show steams;
DYNAMIC DATA MASKING :
----------------------
- Data masking is applied on sensitive data as additional layer of security.
- This is column level security by specifying some columns for masking
- This rules can be assigned to specific roles.
- In order to apply masking create masking policy and then alter table and apply masking policy to it.
eg: Create or replace masking policy mask_phone_number
AS (val varchar) Returns varchar
CASE WHEN Current_Role IN ('','')
THEN Val
ELSE '##-###-## '
END;
Alter table If exists customer modify column phone
SET MASKING POLICY mask_phone_number
========================================================================= EMP AND DEPT TABLES ====================================================
USE DATABASE OUR_FIRST_DB;
-- Create DEPT table
CREATE TABLE DEPT (
DEPTNO INTEGER NOT NULL,
DNAME STRING,
LOC STRING,
CONSTRAINT PK_DEPT PRIMARY KEY (DEPTNO)
);
-- Create EMP table
CREATE TABLE EMP (
EMPNO INTEGER NOT NULL,
ENAME STRING,
JOB STRING,
MGR INTEGER,
HIREDATE DATE,
SAL NUMBER(10,2),
COMM NUMBER(10,2),
DEPTNO INTEGER,
CONSTRAINT PK_EMP PRIMARY KEY (EMPNO),
CONSTRAINT FK_EMP_DEPT FOREIGN KEY (DEPTNO) REFERENCES DEPT(DEPTNO)
);
-- Insert into DEPT
INSERT INTO DEPT (DEPTNO, DNAME, LOC) VALUES
(10, 'ACCOUNTING', 'NEW YORK'),
(20, 'RESEARCH', 'DALLAS'),
(30, 'SALES', 'CHICAGO'),
(40, 'OPERATIONS', 'BOSTON');
-- Insert into EMP
INSERT INTO EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO) VALUES
(7839, 'KING', 'PRESIDENT', NULL, '1981-11-17', 5000, NULL, 10),
(7698, 'BLAKE', 'MANAGER', 7839, '1981-05-01', 2850, NULL, 30),
(7782, 'CLARK', 'MANAGER', 7839, '1981-06-09', 2450, NULL, 10),
(7566, 'JONES', 'MANAGER', 7839, '1981-04-02', 2975, NULL, 20),
(7788, 'SCOTT', 'ANALYST', 7566, '1982-12-09', 3000, NULL, 20),
(7902, 'FORD', 'ANALYST', 7566, '1981-12-03', 3000, NULL, 20),
(7844, 'TURNER', 'SALESMAN', 7698, '1981-09-08', 1500, 0, 30),
(7900, 'JAMES', 'CLERK', 7698, '1981-12-03', 950, NULL, 30),
(7654, 'MARTIN', 'SALESMAN', 7698, '1981-09-28', 1250, 1400, 30),
(7499, 'ALLEN', 'SALESMAN', 7698, '1981-02-20', 1600, 300, 30),
(7521, 'WARD', 'SALESMAN', 7698, '1981-02-22', 1250, 500, 30),
(7934, 'MILLER', 'CLERK', 7782, '1982-01-23', 1300, NULL, 10),
(7876, 'ADAMS', 'CLERK', 7788, '1983-01-12', 1100, NULL, 20),
(7369, 'SMITH', 'CLERK', 7902, '1980-12-17', 800, NULL, 20);
============================================================== QUESTIONS =========================================================================
** List of important syntaxes in snowflake
- Show tables;
- Show stages;
- Desc table emp;
- Desc STAGE STG_CSV
- Select * from Emp Limit 1;
- Select * from Emp where deptno=10;
- Select deptno, avg(sal) from Emp group by deptno;
- CREATE TABLE emp1 AS SELECT * FROM emp;
** Insufficient privileges to operate on schema 'DEV'.
SELECT CURRENT_ROLE();
SHOW GRANTS ON SCHEMA DEV;
** What are the contraints available in snowflake? What does it enforce?
- List of the contrainst are primary key, foreign key, not null, default, unique and check
- Only not null constraints are enforced.
- It doesn't enforce Primary and foreign key constraints.
** What are the different types of data insertion happens in snowflake?
- UI
- SNOWSQL
- SNOWPIPE (copy into using external stage)
- Third party tools (Metallion, informatica)
** List of data insertion performed till now
- Ingest OR Load csv, json & xml files into snowflake : Select database => Click 3 dots on right => Load data => 'Load into a Table' => Upload files =>
Select create table => Give table name
- Load Data into a Table : Click on Ingestion => Add data => Select 'Load into a Table'
- Load Data into a Stage : Click on Ingestion => Add data => Select 'Load into a Stage'
- Load parquet data : Click on Data => Add data => 'Load into a Stage'
** Is copy into not available from table to table?
- The COPY INTO command in Snowflake is designed for data movement between tables and external/internal stages (files) — not from one table to another.
** Why copy into command is used?
- The COPY INTO command in Snowflake is used to load data into tables (from staged files) or unload data from tables (to files in a stage). It’s one of the most
important commands for data ingestion and export.
** What is Snowflake stage?
- A stage (internal or external) in Snowflake is not a database structure.
- It’s purely a file storage area (like a temporary cloud folder) where data files are stored.
- You can copy data to a stage — but it always becomes files (CSV, Parquet, JSON, etc.)
- You cannot copy data to a stage as “rows” or “table-like” data — because a stage doesn’t store table rows, only files
** List of objects we required to insert data
- File
- Table
- File format
- Stage
** What are the different types of tables in snowflake?
- Permanent tables : Default table. Time travel is 90 days. Fail safe 7 days are available only for permanent table.
Create table table_name;
- Temporary tables : Time travel is 1 day. Fail safe is not available here.
Create temporary table orders (Oid int, Oname varchar);
- Transient tables : Time travel is 1 day. Fail safe is not available here.
Create or replace transient table employee (eid int, ename varchar);
- External tables : No Time travel here. Fail safe is not available here.
- Dynamic tables :
- Directory tables :
- Event tables :
- Hybrid tables : Primary key is mandatory. Hybrid tables are currently not available to trial accounts.
Create or replace hybrid table HCUSTOMER (CUSTID int PRIMARY KEY);
- Iceberg tables :
** What are the 3 layers of snowflake?
- Data storage : Data stored in compressed and columnar stucture
- Query processing/Compute : It's made up of virtual warehouses. It executes tasks required for data processing.
- Cloud services : Manages infrasturcture, metadata, security and organization.
** What is SnowSQL?
- It's used when access is not given to snowflake UI.
- It'S called snowflake command line interfact (CLI).
- DML, DDL statements can be performed here.
*
** How to check UDF?
eg: Show user functions
Show functions in schema schema_name
show external functions
show external information_schema.functions
** Can we create snow pipe on internal stages?
- Yes, we can create snow pipe on internal stages.
- There are 3 stages User, table and Named stages.
* How to call UDF?
-
* How to call procedure?
-
* How to load only valid records to the target using copy statement?
- Use copy into my_table from
* Does secured materialized view ococpy storage?
- Yes, it will occupy storagte
* Are the DML's are auto-commited in snowflake?
* How to implement CDC?
- Steams are used to implement CDC.
* Can materialized views created on mulitple tables in snowflake?
- No, it can't be created.
* Which constrains snowflake enforces?
- Snowflake only enforces NOT NULL constraints by default; all other constraints, such as PRIMARY KEY, UNIQUE, and FOREIGN KEY, are not enforced on standard tables
** What's the difference betweens streams and tasks?
- A stream is a change data capture (CDC) object that tracks changes (inserts, updates, deletes) made to a table. whereas
- A task is a scheduling object that lets you run SQL statements (including stored procedures) automatically and on a schedule or when triggered.
** How to get row count from a table?
- Use "DETAILED_OUTPUT=TRUE" line to the syntax and row_count will appear in the result.
** How to copy files from one stage to another stage (aws to azure)?
- COPY FILES INTO @TABLENAME FROM @TABLENAME => This copies all the files from stage1 to stage2.
- COPY INTO @TABLENAME FROM @TABLENAME => This copies table from stage1 to stage2.
* What are the different types of indexes in Sql server and Oracle? Identify the differences.
In SQL Server and Oracle, indexes are crucial database structures that improve query performance. Both databases support several types of indexes, but their implementations and some features differ. Below is a comparison of the types of indexes and their differences.
Types of Indexes in SQL Server:
---------------------------------------
1. Clustered Index
   * Data is physically sorted according to the indexed column(s).
   * Only one clustered index per table.
2. Non-Clustered Index
   * Separate structure from the data. Contains pointers to actual rows.
   * A table can have multiple non-clustered indexes.
3. Unique Index
   * Ensures all values in the indexed column(s) are unique.
4. Filtered Index
  * A non-clustered index with a WHERE clause (e.g., active = 1).
5. Columnstore Index
   * Stores data column-wise, suited for analytics / data warehouses.
   * Types: Clustered or Non-clustered Columnstore Index.
6. Full-text Index
   * Used for full-text searches (e.g., searching for words/phrases in documents).
7. XML Index
   * Designed for indexing XML data types.
8. Spatial Index
   * Optimizes spatial queries (geometry and geography types).
Types of Indexes in Oracle:
----------------------------------
1. B-Tree Index (default)
   * Standard index type, similar to SQL Server's non-clustered index.
2. Bitmap Index
   * Uses bitmaps instead of row pointers. Efficient for low-cardinality columns.
3. Unique Index
   * Automatically created with unique constraints.
4. Composite Index
   * Index on multiple columns.
5. Function-Based Index
   * Index based on an expression or function (e.g., `UPPER(name)`).
6. Reverse Key Index
* Reverses byte order of the key to reduce hot spots in inserts.
7. Domain Index
   * Custom indexes for complex data types like spatial, text, etc.
8. Cluster Index
   * Associated with table clusters, not to be confused with SQL Server's clustered index.
9. Index-Organized Table (IOT)
   * Entire table is stored as a B-tree index (no separate table storage).
10. Bitmap Join Index
    * Joins and indexes two or more tables for faster query performance.                          
* SQL Server focuses more on clustered/non-clustered, filtered, and columnstore indexes.
* Oracle offers more index types for DSS/OLAP systems (like bitmap and function-based indexes).
* Index-Organized Tables in Oracle are similar to clustered indexes in SQL Server.
Oracle—Function Syntax:
-------------------------------
CREATE OR REPLACE FUNCTION function_name (
param1 IN datatype,
param2 IN datatype
)
RETURN return_datatype
IS
result return_datatype;
BEGIN    
-- Your logic here
result := ...;
RETURN result;
END;
PROGRAM:
---------------
CREATE OR REPLACE FUNCTION GetFullName (
firstName IN VARCHAR2,
lastName IN VARCHAR2
)
RETURN VARCHAR2
IS
fullName VARCHAR2(100);
BEGIN
fullName := firstName || ' ' || lastName;
RETURN fullName;
END;
SQL Server – Function Syntax:
--------------------------------------
1. Scalar Function (returns a single value)
CREATE FUNCTION function_name (@param1 datatype, @param2 datatype)
RETURNS return_datatype
AS
BEGIN
DECLARE @result return_datatype;    
   -- Your logic here
    SET @result = ...;
    RETURN @result;
END;
PROGRAM:
---------------
CREATE FUNCTION GetFullName (@firstName NVARCHAR(50), @lastName NVARCHAR(50))
RETURNS NVARCHAR(100)
AS
BEGIN
    RETURN (@firstName + ' ' + @lastName);
END;
2. Table-Valued Function
Syntax:
---------
CREATE FUNCTION function_name (@param datatype)
RETURNS TABLE
AS
RETURN
(
    SELECT * FROM SomeTable WHERE Column = @param
);
PROGRAM:
---------------
CREATE FUNCTION GetEmployeesByDept(@deptId INT)
RETURNS TABLE
AS
RETURN
(
    SELECT * FROM Employees WHERE DeptID = @deptId
);
* Write a program to get employees based on deptno (Table-Valued Function)
ORACLE:
-------------
CREATE OR REPLACE FUNCTION GetEmployeesByDept 
(
    p_deptno IN NUMBER
) 
RETURN SYS_REFCURSOR
AS
    emp_cursor SYS_REFCURSOR;
BEGIN
    OPEN emp_cursor FOR
        SELECT * FROM emp WHERE deptno = p_deptno;
    RETURN emp_cursor;
END;
/
SQL:
------
CREATE OR ALTER FUNCTION GetEmployeesByDept
(
@deptno INT
)
RETURNS TABLE
AS
RETURN
(
    SELECT * FROM Emp WHERE Deptno = @deptno
);
SELECT * FROM dbo.GetEmployeesByDept(10);
* Write a program to get Concat employees name, job and sal (Scalar function)
SQL:
------
CREATE OR ALTER FUNCTION CONCAT_EMP_DETAILS 
(
@ename NVARCHAR(50), 
@job NVARCHAR(50),
@sal int
)
RETURNS NVARCHAR(100)
AS
BEGIN
    RETURN (@ename + ' ' + @job + ' ' + CAST(@sal AS nvarchar));
END;
SELECT dbo.CONCAT_EMP_DETAILS('SMITH', 'CLERK', 3000);
ORACLE:
-------------
CREATE OR REPLACE FUNCTION CONCAT_EMP_DETAILS (
    p_ename IN VARCHAR2,
    p_job   IN VARCHAR2,
    p_sal   IN NUMBER
) RETURN VARCHAR2
IS
    v_result VARCHAR2(200);
BEGIN
    v_result := p_ename || ' ' || p_job || ' ' || TO_CHAR(p_sal);
    RETURN v_result;
END;
/
* Functions do not use EXEC or EXECUTE like stored procedures. 
   You can also join it with other tables.
SELECT e.ename, d.dname
FROM dbo.GetEmployeesByDept(20) e
JOIN Dept d ON e.deptno = d.deptno;
Oracle and SQL Server Triggers differences
ORACLE:
--------------
CREATE OR REPLACE TRIGGER trigger_name
AFTER INSERT ON table_name
FOR EACH ROW
BEGIN
-- Trigger logic here
INSERT INTO audit_table(user_name, action_date)
VALUES (USER, SYSDATE);
END;
/
Program:
-----------
CREATE OR REPLACE TRIGGER trig_emp_table
AFTER DELETE ON EMP
FOR EACH ROW
BEGIN
INSERT INTO emp_trigger_table (empno, ename, job, mgr, hiredate, sal, comm, deptno)
VALUES (:OLD.empno, :OLD.ename, :OLD.job, :OLD.mgr, :OLD.hiredate, :OLD.sal, :OLD.comm, :OLD.deptno);
END;
/
(OR)
CREATE OR REPLACE TRIGGER trig_emp_table
AFTER DELETE ON EMP
FOR EACH ROW
BEGIN
INSERT INTO emp_trigger_table (empno, ename, job, mgr, hiredate, sal, comm, deptno)
VALUES (:New.empno, :New.ename, :New.job, :New.mgr, :New.hiredate, :New.sal, :New.comm, :New.deptno);
END;
/
SQL SERVER:
------------------
CREATE TRIGGER trigger_name
ON table_name
AFTER INSERT
AS
BEGIN
-- Trigger logic here
INSERT INTO audit_table(user_name, action_date)
SELECT SYSTEM_USER, GETDATE();
END;
Program:
-----------
CREATE TRIGGER trig_emp_table
ON EMP
AFTER DELETE
AS
BEGIN
INSERT INTO emp_trigger_table (empno, ename, job, mgr, hiredate, sal, comm, deptno)
SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
FROM deleted;
END;
(OR)
CREATE OR ALTER TRIGGER trig_emp_table
ON EMP
AFTER INSERT
AS
BEGIN
INSERT INTO emp_trigger_table (empno, ename, job, mgr, hiredate, sal, comm, deptno)
SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno    
FROM inserted;
END;
In SQL Server, `deleted` is a special, system-generated table, not a real user-defined table. It's automatically available inside DML triggers to let you access the "before" values of the rows affected by a `DELETE` or `UPDATE`.
SQL Server uses relational concepts, and internally, during a trigger execution, it creates two virtual tables:
 Pseudo-table | Used in                            | Contains                         
 --------------	-----------			         -----------
 `deleted`        | `DELETE`, `UPDATE`  | The old (pre-change) row values  
 `inserted`       | `INSERT`, `UPDATE`   | The new (post-change) row values 
These act like temporary tables and can be queried using normal SQL syntax like:
SELECT * FROM deleted;
Or joined with actual tables if needed:
SELECT d.empno, e.ename 
FROM deleted d
JOIN other_table e ON d.empno = e.empno;
Suppose this statement runs:
DELETE FROM emp WHERE empno = 7369;
Then inside your trigger, the `deleted` table will contain just one row: the one with `empno = 7369`.
In Oracle, `:OLD` is a bind variable used in row-level triggers to refer to the previous (old) values of a row before it was modified or deleted.
* `:OLD` is a record that holds the column values before the triggering DML operation (`UPDATE` or `DELETE`).
* It’s only available in row-level triggers, not statement-level ones.
* You must specify `FOR EACH ROW` in the trigger for `:OLD` to work.
Behavior by DML Type:
 Trigger Type | `:OLD` Available? | `:NEW` Available? | Description  
 ----------------    ----------------------    ----------------------    --------------
 `INSERT`     | NO                          | Yes                         | Only new values exist        
 `UPDATE`   | Yes                          | Yes                         | Both old and new values exist
 `DELETE`   | Yes                          | No                          | Only old values exist        
Example: `AFTER UPDATE` Trigger
CREATE OR REPLACE TRIGGER trg_audit_emp
AFTER UPDATE ON emp
FOR EACH ROW
BEGIN
  INSERT INTO emp_audit (empno, old_sal, new_sal)
  VALUES (:OLD.empno, :OLD.sal, :NEW.sal);
END;
/
Note:
-----
* :OLD.sal is the employee's salary before the update.
* :NEW.sal is the updated salary
* You cannot modify :OLD—it's read-only.
* In contrast, :NEW can be written to in `BEFORE INSERT` or `BEFORE UPDATE` triggers.
There are syntactical and functional differences between Oracle and SQL Server when it comes to writing stored procedures.
Both use different procedural languages:
Oracle uses PL/SQL (Procedural Language/SQL)
SQL Server uses T-SQL (Transact-SQL)
Here's a comparison highlighting key syntactical differences:
1. Procedure Declaration
Oracle:
-------
CREATE OR REPLACE PROCEDURE proc_name (
 p_id IN NUMBER,
p_name OUT VARCHAR2
)
AS
BEGIN
-- PL/SQL block
END;
SQL Server:
-----------
CREATE PROCEDURE proc_name
@id INT,
@name NVARCHAR(100) OUTPUT
AS
BEGIN
-- T-SQL block
END;
2. Variable Declaration
Oracle:
-------
DECLARE
v_total NUMBER;
SQL Server:
-----------	
DECLARE @total INT;
3. Calling a Stored Procedure with Output Parameters
Oracle:
-------
DECLARE
v_name VARCHAR2(100);
BEGIN
proc_name(101, v_name);
DBMS_OUTPUT.PUT_LINE(v_name);
END;
SQL Server:
-----------	
DECLARE @name NVARCHAR(100);
EXEC proc_name @id = 101, @name = @name OUTPUT;
PRINT @name;
4. Exception Handling
Oracle:
-------
BEGIN
-- statements
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE('Error');
END;
SQL Server:
-----------
BEGIN TRY
-- statements
END TRY
BEGIN CATCH
PRINT 'Error';
END CATCH
STORED PROCEDURES: ----------------------------------
* Write a program to get department-wise salaries.
SQL:
Create or Alter procedure proc_depwise_salaries
@in_deptno int
AS
Begin
Select sal from emp where deptno=@in_deptno;
end;
Execute proc_depwise_salaries @in_deptno=10
ORACLE:
Create or replace Procedure Get_deptwises_sal (
input_deptno IN NUMBER )
AS
BEGIN
select * from emp where deptno=in_deptno;
end;
BEGIN
Get_deptwises_sal(10);
END;
/
* Write a program to get max salaries department wise
SQL:
----  
Create or Alter procedure proc_max_salaries_depwise
@deptno int,
@maxsal int 
AS
Begin
Select @maxsal = max(sal) from emp where @deptno=deptno group by deptno;
end;
DECLARE @result INT;
EXEC proc_max_salaries_depwise @deptno=10, @maxsal =@result
ORACLE:
-------------
CREATE OR REPLACE PROCEDURE proc_max_salaries_depwise
( p_deptno IN  NUMBER,
  p_maxsal OUT NUMBER )
AS
BEGIN
SELECT MAX(sal) INTO p_maxsal FROM emp WHERE  deptno = p_deptno;
END;
/
DECLARE
v_maxsal NUMBER;
BEGIN
proc_max_salaries_depwise(10, v_maxsal);
DBMS_OUTPUT.PUT_LINE('Max salary = ' || v_maxsal);
END;
/
Declaring Parameters in Stored Procedures:
-----------------------------------------------------
When you define input and output parameters for a stored procedure in SQL Server, you do not use the DECLARE keyword.
Example:
------------
CREATE PROCEDURE GetDeptSalaryStats
    @inputDeptNo INT,                 -- Input parameter
    @totalSalary DECIMAL(18,2) OUTPUT, -- Output parameter
    @averageSalary DECIMAL(18,2) OUTPUT -- Output parameter
AS
BEGIN
    -- No need to DECLARE again, these are already declared as parameters
    SELECT 
        @totalSalary = SUM(sal),
        @averageSalary = AVG(sal)
    FROM emp
    WHERE deptno = @inputDeptNo;
END;
Note:
-------
* @inputDeptNo, @totalSalary, and @averageSalary are parameters, not local variables.
* When parameters are declared in the procedure header, DECLARE is not used.
* You can use them directly inside the body of the procedure.
You use DECLARE only inside the procedure body or script to declare local variables, like this:
DECLARE @x INT = 10;
This is not a parameter but a local variable, usable only in that block or batch.
| Usage Context | Syntax Example | Uses DECLARE? | Scope
| ------------------------- | ----------------------------------- | ------------- | -----------------
| Procedure parameters | @deptno INT, @totSal DECIMAL OUTPUT | No | Across procedure
| Local variable | DECLARE @x INT = 5; | Yes | Inside procedure
| Ad-hoc script variable | DECLARE @deptno INT = 10; | Yes | In script/session
Hive organizes the tables into partitions, and partitions are subdivided into buckets.
ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec
[LOCATION 'location1'] partition_spec [LOCATION 'location2'] ...;
partition_spec:
: (p_column = p_col_value, p_column = p_col_value, ...)
ALTER TABLE employee
ADD PARTITION (year=’2012’)
location '/2012/part2012';
ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;
ALTER TABLE employee PARTITION (year=’1203’)
RENAME TO PARTITION (Yoj=’1203’);
ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec, PARTITION partition_spec,...;
ALTER TABLE employee DROP [IF EXISTS]
PARTITION (year=’1203’);
CREATE DATABASE [IF NOT EXISTS] userdb;
DROP DATABASE IF EXISTS userdb;
IF NOT EXISTS : This is a optional clause we are using to remove the table if it exist.
SHOW DATABASE : This command is used to list the databases.
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]
(OR)
CREATE TABLE IF NOT EXISTS employee ( 
eid int, 
name String,
salary String, 
destination String )
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;
Brief explanation of the above syntax:
DELIMITED is a Row Format
‘\t’ is a Field terminator 
‘\n’ is a Lines terminator
TEXTFILE is a File type.
Generally, we use to insert data into sql and oracle databases using INSERT statments but here we insert data into hive using LOAD DATA statement.
LOAD DATA statement is used to store bulk records.
We have two options to load data 
- Local file system
- Hadoop file system
SYNTAX:
-------------
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename 
[PARTITION (partcol1=val1, partcol2=val2 ...)]
QUERY:
-----------
LOAD DATA LOCAL INPATH '/home/user/sample.txt'
OVERWRITE INTO TABLE employee;
PARTITION is optional to use in the query.
ALTER TABLE name RENAME TO new_name
ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])
ALTER TABLE name DROP [COLUMN] column_name
ALTER TABLE name CHANGE column_name new_name new_type
ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])
DROP TABLE IF EXISTS employee;
Data types in Hadoop are classified into 4 types
- Column Types
- Literals
- Null Values
- Complex Types
When the data range exceeds the range of INT, you need to use BIGINT and if the data range is smaller than the INT, you use SMALLINT. TINYINT is smaller than SMALLINT.
String type data types can be specified using single quotes (' ') or double quotes (" "). It contains two data types: VARCHAR and CHAR
Timestamp (“YYYY-MM-DD HH:MM:SS.fffffffff”)
Dates (YYYY-MM-DD)
Decimals (10,0)
Union Types : It is a collection of heterogeneous data types.
Null Value : Missing values are referred here as null values.
Arrays : These are used just like in java.
Maps : These are similar to java maps
Structs : Here complex data is used with comments.
The term bigdata is used to refer large volumes of data. Using the traditional way to handle this data is very difficult. So, the Apache software has created a framework called hadoop to handle these problems.
Hadoop is the open-source framework to stored and process bigdata. It contains two modules Hadoop Distributed file system (HDFS) and MapReduce
MapReduce: It is used to process the large volumes or structured, semi-structured and unstructured data.
HDFS : It is used to store the data.
There are certain tools like Sqoop, Pig, and Hive helps hadoop modules.
Hive is not a Relational data base. It is query engine.
Import Command is used to import data from DBMS to hive and Exports command is used to export data into specific location.
Window functions perform calculations across a set of rows related to the current row without collapsing the rows into a single result. Unlike aggregate functions, window functions retain the row-level details while adding aggregated or ranked data as additional columns.
Window functions and analytical functions are the same.
Key Components of Window Functions:
- OVER() Clause: Defines the window of rows to perform calculations on.
- PARTITION BY: Divides rows into groups or partitions for calculation.
- ORDER BY: Specifies the order of rows within the window.
Questions and Answers:
------------------------------
* How can I calculate a running total of sales for each customer?
SELECT 
CustomerID, 
OrderDate, 
Amount, 
SUM(Amount) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS RunningTotal
FROM Sales;
- `PARTITION BY CustomerID`: Groups rows by each customer.
- `ORDER BY OrderDate`: Orders rows by date within each group.
- `SUM(Amount)`: Adds up the sales amounts progressively.
* How can I rank employees based on their salary within each department?
SELECT 
DepartmentID, 
EmployeeID, 
Salary, 
RANK() OVER (PARTITION BY DepartmentID ORDER BY Salary DESC) AS Rank
FROM Employees;
- `RANK()`: Assigns a rank to each employee in descending salary order.
- `PARTITION BY DepartmentID`: Ensures the ranking is specific to each department.
* How can I calculate the percentile rank of students based on their scores?
SELECT 
StudentID, 
Score, 
PERCENT_RANK() OVER (ORDER BY Score DESC) AS Percentile
FROM Students;
- `PERCENT_RANK()`: Calculates the relative standing of each score in the list.
- No `PARTITION BY` clause means the calculation applies to all rows.
* How can I find the sales amount of the previous order for each customer?
SELECT 
CustomerID, 
OrderID, 
OrderDate, 
Amount, 
LAG(Amount) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS PreviousOrderAmount
FROM Sales;
- `LAG(Amount)`: Returns the value of the previous row in the window.
- `PARTITION BY CustomerID`: Ensures the calculation is specific to each customer.
* How can I calculate the 3-month moving average of sales for each product?
SELECT 
ProductID, 
OrderDate, 
Amount, 
AVG(Amount) OVER (PARTITION BY ProductID ORDER BY OrderDate ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS MovingAvg
FROM Sales;
- `ROWS BETWEEN 2 PRECEDING AND CURRENT ROW`: Defines a sliding window of 3 rows.
- `AVG(Amount)`: Calculates the average over this window.
These examples demonstrate how window functions can provide powerful tools for advanced analytics while preserving row-level details.
The different types of assets in IICS are - Tasks - Data ingestion and Replication - Mappings - Mapplets - Taskflows - Components