CREATING A WAREHOUSE :
---------------------------------------
CREATE OR REPLACE WAREHOUSE FIRST_WAREHOUSE
WITH
WAREHOUSE_SIZE = XSMALL
AUTO_SUSPEND = 300
SCALING_POLICY = 'Economy'
MIN_CLUSTER_COUNT = 1
MAX_CLUSTER_COUNT =3
INITIALLY_SUSPENDED = TRUE
COMMENT = "This is our first warehouse"
LOADING DATA :
--------------
* BULK LOADING:
- It uses our own databases and warehouses
- It uses copy command
- Transformations
* CONTINIOUS LOADING:
- Small volumes of data
- Serverless
STAGES :
--------
* EXTERNAL STORAGE
- External cloud provider
- S3 bucket
- Google cloud platform
- Microsoft Azure
- Database Object created in schema
* INTERNAL STORAGE
- Local storage is maintained by snowflake
PERFORMANCE TUNING :
--------------------
Most of the performance tuning is done by snowflake automatically.
* Dedicated warehouses
- To DBA, Data Scientists and Analysts
* Scaling up
- Increasing warehouses size
* Scaling down
- Adding warehouses/clusters
* Caching
- Automated process when query is executed
* Clustering
- Its automatically maintained by snowflake
* What is a Snowpipe?
- Snowpipe is Snowflake's continuous, fully managed data ingestion service.
- It automatically loads data into Snowflake from cloud storage services like Amazon S3, Azure Blob Storage, and Google Cloud Storage
Step of Setting up snowpipe :
-----------------------------
- Create Stage
- Copy Command
- Create Pipe
- S3 Notification
* What is Time travel?
- If the data or database is deleted or dropped then historical data is queried using time travel.
- Restored items like Query deleted, updated data, restored tables, schemas, databases, clones of tables.
eg: Select * from table AT (TIMESTAMP => timestamp)
Select * from table AT (OFFSET => -10*60)
Select * from table BEFORE (STATEMENT => query_id)
Undrop table table_name
Undrop schema schema_name
Undrop Database database_name
- Standard edition can travel back 1 day to the past whereas Enterprise, Business Critical and Virtual Private edition can travel back 90 days to the past.
It's also called as retention period.
- Data loss can happen when using time travel when you execute it 2nd time.
* What is fail safe?
- It is used to recover historical data when there is any disaster.
- It's not configurable 7 day period for permanent tables.
- This state starts when the time travel ends.
- We need to reach snowflake to recoved it as we can't do anything on it as users.
DIFFERENT TYPES OF TABLES :
---------------------------
* Permanent tables
- They are created using create table command.
- Time travel retention period exists here.
- Fail safe exists here.
* Transient tables
- They are created using create transient table command.
- Time travel retention period exists here.
* Temporary tables
- They are created using create temporary table command.
- Time travel retention period exists here.
- They only exists in current session.
External Tables, Hybrid Tables, and Iceberg Tables
CLONING TABLES :
----------------
* Zero copy cloning : It means data will not be copied in this cloning.
- It's used to clone datbases, schemas, tables, stage, pipe, stream and tasks.
- All the child objects will be clone but not internal stages.
- It an independent table and will reference original table.
- It will not be effected when original tables is modified.
- Select previlage is enough for Tables
- Owner previlages are required for pipe, stream and tasks.
- Usage previlages for other objects
- Load history metadata is not copied.
- Cloning for specific point in time is possible.
- It's not possible to create temporary table into permanent table.
eg: Create table tablename clone table_source
SWAPPING TABLES :
-----------------
- It is used to take development code into production code.
- Metadata swapping is used here.
- It can be done on tables and schemas.
eg: Alter table table_name swap with target_tablename
DATA SHARING :
--------------
- Here provider Account_1 is accessed by consumer Account_2 and consumer can copy the data.
- Provider stores data with his storage Account_1 and
eg: GRANT USAGE ON DATABASE DATA_S TO SHARE ORDERS_SHARE;
GRANT USAGE ON SCHEMA DATA_S.PUBLIC TO SHARE ORDERS_SHARE;
GRANT SELECT ON TABLE DATA_S.PUBLIC.ORDERS TO SHARE ORDERS_SHARE;
- It can be shared to non-snowflake users with accountadmin account.
DATA SAMPLING :
---------------
- Instead of taking large amount data only some of the sample data is taken in data sampling.
- It is useful for query development or data analysis
- There are 2 meathods of sampling Row or Bernouli and Block or System meathods.
- 50% of data rows are selected in Row or Bernouli meathods
- 50% of data block are selected in Block or System meathods.
DYNAMIC DATA MASKING :
----------------------
- Data masking is applied on sensitive data as additional layer of security.
- This is column level security by specifying some columns for masking
- This rules can be assigned to specific roles.
- In order to apply masking create masking policy and then alter table and apply masking policy to it.
eg: Create or replace masking policy mask_phone_number
AS (val varchar) Returns varchar
CASE WHEN Current_Role IN ('','')
THEN Val
ELSE '##-###-## '
END;
Alter table If exists customer modify column phone
SET MASKING POLICY mask_phone_number