Snowflake is the first clould based platform built it as dataware as services (daas). It can run of cloud providers like AWS, GCP and AZURE.
It doesn't require hardware and software to install and configure. It runs completely on public cloud infrastructure.
SNOWFLAKE ARCHITECTURE :
------------------------
* Data storage layer :
- Data is not stored internally. It's stored using external cloud provider upon our selection while create the snowflake account.
- Data stored in columns and compressed into blobs
* Query processing/Compute :
- It's called the muscle of the system.
- It's the layer where the queries are processed.
- It's made up of virtual warehouses. It executes tasks required for data processing.
- This layer processes MPP (Massive Parallel Processing)
* Cloud services :
- It's called the brain of the system.
- Managing infrastructure, access control, s ecurity optimizer and metadata are done here.
CREATING A WAREHOUSE :
----------------------
CREATE OR REPLACE WAREHOUSE FIRST_WAREHOUSE
WITH
WAREHOUSE_SIZE = XSMALL
AUTO_SUSPEND = 300
SCALING_POLICY = 'Economy' (To maximize cost efficiency by scaling only when necessary)
MIN_CLUSTER_COUNT = 1
MAX_CLUSTER_COUNT =3
INITIALLY_SUSPENDED = TRUE
COMMENT = "This is our first warehouse"
LOADING DATA :
--------------
* BULK LOADING:
- It uses our own databases and warehouses
- It uses copy command
- Transformations
* CONTINIOUS LOADING:
- Small volumes of data
- Serverless
STAGES :
--------
It's the storage area used for data processing during the extract, transform and load process. There are 2 types Internal and External.
* EXTERNAL STORAGE
- External cloud provider
- S3 bucket
- Google cloud platform
- Microsoft Azure
- Database Object created in schema
* INTERNAL STORAGE
- Table stage => They are automatically defined, No setup needed and accessed only by table owner.
- LIST @~ => This command is used to display table stage.
- User stage => They are automatically defined, No setup needed and accessed only by user.
- LIST @%table => This command is used to display user stage.
- Named Internal stage
- LIST @table => This command is used to display Named stage.
- Local storage is maintained by snowflake
- Select $1, $2, $3 from stage_table is used to access the table
COPY COMMAND :
--------------
* Copy into <table_name> from <stage_name>
* Parameters:
- ON_ERROR = {CONTINUE/SKIP_FILE}
- VALIDATTION_MODE = RETURN_ROWS
(OR)
COPY INTO OUR_FIRST_DB.PUBLIC.ORDERS
FROM @aws_stage
file_format = (type = csv field_delimiter=',' skip_header=1);
PERFORMANCE TUNING :
--------------------
Most of the performance tuning is done by snowflake automatically.
* Dedicated warehouses
- To DBA, Data Scientists and Analysts
* Scaling up
- Increasing warehouses size
* Scaling down
- Adding warehouses/clusters
* Caching
- Automated process when query is executed
* Clustering
- Its automatically maintained by snowflake
* What is a Snowpipe?
- Snowpipe is Snowflake's continuous, fully managed data ingestion service.
- It automatically loads data into Snowflake from cloud storage services like Amazon S3, Azure Blob Storage, and Google Cloud Storage
Step of Setting up snowpipe :
-----------------------------
- Create Stage
- Copy Command
- Create Pipe
- S3 Notification
* What is Time travel?
- If the data or database is deleted or dropped then historical data is queried using time travel.
- Restored items like Query deleted, updated data, restored tables, schemas, databases, clones of tables.
eg: Select * from table AT (TIMESTAMP => timestamp)
Select * from table AT (OFFSET => -10*60)
Select * from table BEFORE (STATEMENT => query_id)
Undrop table table_name
Undrop schema schema_name
Undrop Database database_name
- Standard edition can travel back 1 day to the past whereas Enterprise, Business Critical and Virtual Private edition can travel back 90 days to the past.
It's also called as retention period.
- Data loss can happen when using time travel when you execute it 2nd time.
* What is fail safe?
- It is used to recover historical data when there is any disaster.
- It's not configurable, 7 day period for permanent tables.
- This state starts when the time travel ends.
- We need to reach snowflake to recoved it as we can't do anything on it as users.
DIFFERENT TYPES OF TABLES :
---------------------------
- They are created using create table command.
- Time travel retention period exists here.
- Fail safe exists here.
- They are created using create transient table command.
- Time travel retention period exists here.
- They are created using create temporary table command.
- Time travel retention period exists here.
- They only exists in current session.
External Tables, Hybrid Tables, and Iceberg Tables
CLONING TABLES :
----------------
* Zero copy cloning : It means data will not be copied in this cloning.
- It's used to clone datbases, schemas, tables, stage, pipe, stream and tasks.
- All the child objects will be clone but not internal stages.
- It's an independent table and will reference original table.
- It will not be effected when original tables is modified.
- Select previlage is enough for Tables
- Owner previlages are required for pipe, stream and tasks.
- Usage previlages for other objects
- Load history metadata is not copied.
- Cloning for specific point in time is possible.
- It's not possible to create temporary table into permanent table.
eg: Create table tablename clone table_source
SWAPPING TABLES :
-----------------
- It is used to take development code into production code.
- Metadata swapping is used here.
- It can be done on tables and schemas.
eg: Alter table table_name swap with target_tablename
DATA SHARING :
--------------
- Here provider Account_1 is accessed by consumer Account_2 and consumer can copy the data.
- Provider stores data with his storage Account_1 and
eg: GRANT USAGE ON DATABASE DATA_S TO SHARE ORDERS_SHARE;
GRANT USAGE ON SCHEMA DATA_S.PUBLIC TO SHARE ORDERS_SHARE;
GRANT SELECT ON TABLE DATA_S.PUBLIC.ORDERS TO SHARE ORDERS_SHARE;
- It can be shared to non-snowflake users with accountadmin account.
DATA SAMPLING :
---------------
- Instead of taking large amount data only some of the sample data is taken in data sampling.
- It is useful for query development or data analysis
- There are 2 meathods of sampling Row or Bernouli and Block or System meathods.
- 50% of data rows are selected in Row or Bernouli meathods
- 50% of data block are selected in Block or System meathods.
STREAMS :
---------
- Steam is an object that records or caputures DML changes made to the table.
- This process of caputure the changes to the data is called change data capture (CDC)
- It is created on the top of any specific table.
- It stores metadata but not the data of the tables on which it's created.
- Once the stream operation is completed, we consume steam object and data will be removed from it.
- When streams are created on a table, it will add two columns to it METADATA$ACTION and METADATA$UPDATE.
- METADATA$ACTION column will be updated with INSERT, DELETE and METADATA$UPDATE column will be updated with TRUE
- List of steams created on table are Insert, Update, Delete and Offset
eg: Create stream <steam_name> on table <table_name>
Select * from <steam_name>
Show steams;
DYNAMIC DATA MASKING :
----------------------
- Data masking is applied on sensitive data as additional layer of security.
- This is column level security by specifying some columns for masking
- This rules can be assigned to specific roles.
- In order to apply masking create masking policy and then alter table and apply masking policy to it.
eg: Create or replace masking policy mask_phone_number
AS (val varchar) Returns varchar
CASE WHEN Current_Role IN ('','')
THEN Val
ELSE '##-###-## '
END;
Alter table If exists customer modify column phone
SET MASKING POLICY mask_phone_number
========================================================================= EMP AND DEPT TABLES ====================================================
USE DATABASE OUR_FIRST_DB;
-- Create DEPT table
CREATE TABLE DEPT (
DEPTNO INTEGER NOT NULL,
DNAME STRING,
LOC STRING,
CONSTRAINT PK_DEPT PRIMARY KEY (DEPTNO)
);
-- Create EMP table
CREATE TABLE EMP (
EMPNO INTEGER NOT NULL,
ENAME STRING,
JOB STRING,
MGR INTEGER,
HIREDATE DATE,
SAL NUMBER(10,2),
COMM NUMBER(10,2),
DEPTNO INTEGER,
CONSTRAINT PK_EMP PRIMARY KEY (EMPNO),
CONSTRAINT FK_EMP_DEPT FOREIGN KEY (DEPTNO) REFERENCES DEPT(DEPTNO)
);
-- Insert into DEPT
INSERT INTO DEPT (DEPTNO, DNAME, LOC) VALUES
(10, 'ACCOUNTING', 'NEW YORK'),
(20, 'RESEARCH', 'DALLAS'),
(30, 'SALES', 'CHICAGO'),
(40, 'OPERATIONS', 'BOSTON');
-- Insert into EMP
INSERT INTO EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO) VALUES
(7839, 'KING', 'PRESIDENT', NULL, '1981-11-17', 5000, NULL, 10),
(7698, 'BLAKE', 'MANAGER', 7839, '1981-05-01', 2850, NULL, 30),
(7782, 'CLARK', 'MANAGER', 7839, '1981-06-09', 2450, NULL, 10),
(7566, 'JONES', 'MANAGER', 7839, '1981-04-02', 2975, NULL, 20),
(7788, 'SCOTT', 'ANALYST', 7566, '1982-12-09', 3000, NULL, 20),
(7902, 'FORD', 'ANALYST', 7566, '1981-12-03', 3000, NULL, 20),
(7844, 'TURNER', 'SALESMAN', 7698, '1981-09-08', 1500, 0, 30),
(7900, 'JAMES', 'CLERK', 7698, '1981-12-03', 950, NULL, 30),
(7654, 'MARTIN', 'SALESMAN', 7698, '1981-09-28', 1250, 1400, 30),
(7499, 'ALLEN', 'SALESMAN', 7698, '1981-02-20', 1600, 300, 30),
(7521, 'WARD', 'SALESMAN', 7698, '1981-02-22', 1250, 500, 30),
(7934, 'MILLER', 'CLERK', 7782, '1982-01-23', 1300, NULL, 10),
(7876, 'ADAMS', 'CLERK', 7788, '1983-01-12', 1100, NULL, 20),
(7369, 'SMITH', 'CLERK', 7902, '1980-12-17', 800, NULL, 20);
============================================================== QUESTIONS =========================================================================
** List of important syntaxes in snowflake
- Show tables;
- Show stages;
- Desc table emp;
- Desc STAGE STG_CSV
- Select * from Emp Limit 1;
- Select * from Emp where deptno=10;
- Select deptno, avg(sal) from Emp group by deptno;
- CREATE TABLE emp1 AS SELECT * FROM emp;
** Insufficient privileges to operate on schema 'DEV'.
SELECT CURRENT_ROLE();
SHOW GRANTS ON SCHEMA DEV;
** What are the contraints available in snowflake? What does it enforce?
- List of the contrainst are primary key, foreign key, not null, default, unique and check
- Only not null constraints are enforced.
- It doesn't enforce Primary and foreign key constraints.
** What are the different types of data insertion happens in snowflake?
- UI
- SNOWSQL
- SNOWPIPE (copy into using external stage)
- Third party tools (Metallion, informatica)
** List of data insertion performed till now
- Ingest OR Load csv, json & xml files into snowflake : Select database => Click 3 dots on right => Load data => 'Load into a Table' => Upload files =>
Select create table => Give table name
- Load Data into a Table : Click on Ingestion => Add data => Select 'Load into a Table'
- Load Data into a Stage : Click on Ingestion => Add data => Select 'Load into a Stage'
- Load parquet data : Click on Data => Add data => 'Load into a Stage'
** Is copy into not available from table to table?
- The COPY INTO command in Snowflake is designed for data movement between tables and external/internal stages (files) — not from one table to another.
** Why copy into command is used?
- The COPY INTO command in Snowflake is used to load data into tables (from staged files) or unload data from tables (to files in a stage). It’s one of the most
important commands for data ingestion and export.
** What is Snowflake stage?
- A stage (internal or external) in Snowflake is not a database structure.
- It’s purely a file storage area (like a temporary cloud folder) where data files are stored.
- You can copy data to a stage — but it always becomes files (CSV, Parquet, JSON, etc.)
- You cannot copy data to a stage as “rows” or “table-like” data — because a stage doesn’t store table rows, only files
** List of objects we required to insert data
- File
- Table
- File format
- Stage
** What are the different types of tables in snowflake?
- Permanent tables : Default table. Time travel is 90 days. Fail safe 7 days are available only for permanent table.
Create table table_name;
- Temporary tables : Time travel is 1 day. Fail safe is not available here.
Create temporary table orders (Oid int, Oname varchar);
- Transient tables : Time travel is 1 day. Fail safe is not available here.
Create or replace transient table employee (eid int, ename varchar);
- External tables : No Time travel here. Fail safe is not available here.
- Dynamic tables :
- Directory tables :
- Event tables :
- Hybrid tables : Primary key is mandatory. Hybrid tables are currently not available to trial accounts.
Create or replace hybrid table HCUSTOMER (CUSTID int PRIMARY KEY);
- Iceberg tables :
** What are the 3 layers of snowflake?
- Data storage : Data stored in compressed and columnar stucture
- Query processing/Compute : It's made up of virtual warehouses. It executes tasks required for data processing.
- Cloud services : Manages infrasturcture, metadata, security and organization.
** What is SnowSQL?
- It's used when access is not given to snowflake UI.
- It'S called snowflake command line interfact (CLI).
- DML, DDL statements can be performed here.
*
** How to check UDF?
eg: Show user functions
Show functions in schema schema_name
show external functions
show external information_schema.functions
** Can we create snow pipe on internal stages?
- Yes, we can create snow pipe on internal stages.
- There are 3 stages User, table and Named stages.
* How to call UDF?
-
* How to call procedure?
-
* How to load only valid records to the target using copy statement?
- Use copy into my_table from
* Does secured materialized view ococpy storage?
- Yes, it will occupy storagte
* Are the DML's are auto-commited in snowflake?
* How to implement CDC?
- Steams are used to implement CDC.
* Can materialized views created on mulitple tables in snowflake?
- No, it can't be created.
* Which constrains snowflake enforces?
- Snowflake only enforces NOT NULL constraints by default; all other constraints, such as PRIMARY KEY, UNIQUE, and FOREIGN KEY, are not enforced on standard tables
** What's the difference betweens streams and tasks?
- A stream is a change data capture (CDC) object that tracks changes (inserts, updates, deletes) made to a table. whereas
- A task is a scheduling object that lets you run SQL statements (including stored procedures) automatically and on a schedule or when triggered.
** How to get row count from a table?
- Use "DETAILED_OUTPUT=TRUE" line to the syntax and row_count will appear in the result.
** How to copy files from one stage to another stage (aws to azure)?
- COPY FILES INTO @TABLENAME FROM @TABLENAME => This copies all the files from stage1 to stage2.
- COPY INTO @TABLENAME FROM @TABLENAME => This copies table from stage1 to stage2.
 
No comments:
Post a Comment