This document outlines the process of setting up a Snowflake connection and using valmi.io to synchronize data from a Snowflake database to any of the available supported destinations.
To ensure efficient data retrieval and transfer between tables in your database and supported destinations, valmi.io calculates “diff” to identify the changes between each update. This minimizes the impact on your database’s performance and other applications’ APIs. To accomplish this, valmi.io creates and manages a set of tables in a designated bookkeeping schema, typically consisting of 2 or 3 tables per sync job.
To establish a successful connection with your Snowflake database, we recommend creating a dedicated user account exclusively for valmi.io, with a strong and unique password. This account will be used by valmi.io for authentication. To ensure proper functionality, the valmi account must be granted specific permissions as outlined below:
The account should have full administrative access to all tables within the bookkeeping schema. This includes the ability to create, delete, read from, and write to these tables.
Read-only access should be granted to any tables and views in the desired schema that you want valmi.io to publish to your service destinations.
Configuring Snowflake permissions can be complex, given the various ways permissions can be configured. The script below is known to work correctly and follows Snowflake’s best practices for creating read-only roles in a hierarchy:
-- Create a role for the valmi user CREATE ROLE VALMI_ROLE; -- Ensure the sysadmin role inherits any privileges the valmi role is granted. Note that this does not grant sysadmin privileges to the valmi role GRANT ROLE VALMI_ROLE TO ROLE SYSADMIN; -- Create a warehouse for the valmi user, optimizing for cost over performance CREATE WAREHOUSE VALMI_WAREHOUSE WITH WAREHOUSE_SIZE = XSMALL AUTO_SUSPEND = 60 AUTO_RESUME = TRUE INITIALLY_SUSPENDED = FALSE; -- Allow the valmi user to run queries in the warehouse GRANT USAGE ON WAREHOUSE VALMI_WAREHOUSE TO ROLE VALMI_ROLE; -- Allow the valmi user to start and stop the warehouse and abort running queries in the warehouse GRANT OPERATE ON WAREHOUSE VALMI_WAREHOUSE TO ROLE VALMI_ROLE; -- Allow the valmi user to see historical query statistics on queries in its warehouse GRANT MONITOR ON WAREHOUSE VALMI_WAREHOUSE TO ROLE VALMI_ROLE; -- Create the valmi user -- Do not set DEFAULT_WORKSPACE, this will impact which tables are visible to Census CREATE USER VALMI WITH DEFAULT_ROLE = VALMI_ROLE DEFAULT_WAREHOUSE = VALMI_WAREHOUSE PASSWORD = '<strong, unique password>'; -- Grant the valmi role to the valmi user GRANT ROLE VALMI_ROLE TO USER VALMI; -- Create a private bookkeeping database where Census can store sync state -- Skip this step if working in read-only mode CREATE DATABASE "VALMI_IO"; -- Give the valmi user full access to the bookkeeping database -- Skip this step if working in read-only mode GRANT ALL PRIVILEGES ON DATABASE "VALMI_IO" TO ROLE VALMI_ROLE; -- Create a private bookkeeping schema where Census can store sync state -- Skip this step if working in read-only mode CREATE SCHEMA "VALMI_IO".valmi_scratch_computed_; -- Give the valmi user full access to the bookkeeping schema -- Skip this step if working in read-only mode GRANT ALL PRIVILEGES ON SCHEMA "VALMI_IO".valmi_scratch_computed_ TO ROLE VALMI_ROLE; -- Give the valmi user the ability to create stages for unloading data -- Skip this step if working in read-only mode GRANT CREATE STAGE ON SCHEMA "VALMI_IO".valmi_scratch_computed_ TO ROLE VALMI_ROLE; -- Let the valmi user see this database GRANT USAGE ON DATABASE "<your database>" TO ROLE VALMI_ROLE; -- Let the valmi user see this schema GRANT USAGE ON SCHEMA "<your database>"."<your schema>" TO ROLE VALMI_ROLE; -- Let the valmi user read all existing tables in this schema GRANT SELECT ON ALL TABLES IN SCHEMA "<your database>"."<your schema>" TO ROLE VALMI_ROLE; -- Let the valmi user read any new tables added to this schema GRANT SELECT ON FUTURE TABLES IN SCHEMA "<your database>"."<your schema>" TO ROLE VALMI_ROLE; -- Let the valmi user read all existing views in this schema GRANT SELECT ON ALL VIEWS IN SCHEMA "<your database>"."<your schema>" TO ROLE VALMI_ROLE; -- Let the valmi user read any new views added to this schema GRANT SELECT ON FUTURE VIEWS IN SCHEMA "<your database>"."<your schema>" TO ROLE VALMI_ROLE; -- Let the valmi user execute any existing functions in this schema GRANT USAGE ON ALL FUNCTIONS IN SCHEMA "<your database>"."<your schema>" TO ROLE VALMI_ROLE; -- Let the valmi user execute any new functions added to this schema GRANT USAGE ON FUTURE FUNCTIONS IN SCHEMA "<your database>"."<your schema>" TO ROLE VALMI_ROLE;
Creating a Snowflake Connection on valmi.io
The picture below shows the configuration page for a Postgres connection. The following fields are required:
- Connection Name - A name for the connection.
- Snowflake Account Name - The Account identifier for your snowflake. For example, depending on your cloud region, it looks like this
ro64973.ap-south-1.aws. For more information on your snowflake account identifier, refer to the Snowflake documentation.
- Warehouse - The warehouse for compute operations performed by valmi.io. If created with the above script, it is
- Username - The username to use to connect to the warehouse.
- Password - The password to use to connect to the warehouse.
Setting up a Sync
Selecting a table or a view to Sync
Once the connection is configured, sync creation involves selecting a source, a destination, a mapping and a schedule. Here we describe how to select a source. Selecting a Snowflake source while creating the sync requires the following steps:
- Select the Snowflake connection you created earlier.
- Select a snowflake database.
- Select a namespace.
- Select a database table or a view.
Supported Source Modes
Once you have chosen a data source, you will be prompted to select a destination connection. For detailed instructions on configuring the destination, please consult the specific documentation provided. After selecting the destination, you will proceed to choose the mapping for the synchronization process.
On the mapping page, you will be required to specify a source mode that governs how the source connector retrieves data from Postgres. There are two available modes to choose from:
- Full Table - This mode extracts the entire table from Snowflake. This is the default mode.
- Incremental - This mode extracts only the rows that have changed since the last sync. The data to extract will be determined by comparing the current table with a snapshot of the table taken during the last sync.
Selecting a Primary Key
During the configuration process, you will be prompted to choose a primary key. The primary key serves multiple purposes:
It acts as a unique identifier for objects in the destination. For instance, when synchronizing a Postgres table with a Salesforce object, the primary key is used to identify the corresponding Salesforce object.
It is utilized for determining the validity of rows by performing checks such as null and duplicate checks based on the primary key.
It is important to note that the primary key should consist of a single column. In cases where the table does not have a primary key, you have the option to select a unique column instead.