My opinion of role configuration using dbt in Snowflake #dbt #SnowflakeDB

2022.02.12

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

(This article is English ver of this article)

Hi! I'm Sagara(In Japanese, "さがら".)

dbt is a great match for building a data mart in Snowflake.

However, when you actually try to use dbt and Snowflake together, you may wonder, "What kind of role configuration should I use? I'm sure many of you are wondering.

So, I thought about the Snowflake role configuration for using dbt, and I summarized it in this article.

My opinion of role configuration using dbt in Snowflake

The following diagram shows what I consider to be the role configuration for using dbt in Snowflake.

Description of each role

  • dbt_admin_role
    • A role that manages all databases associated with the dbt project.
    • Since sysadmin can access databases that are not related to dbt, I thought it would be better to have a role under sysadmin to manage databases created by dbt.
  • dbt_aaa_dev_role
    • A role that executes the job of dbt project "aaa" to generate a data mart, and is associated with it by option when dbt project is created.
    • By giving the CREATE SCHEMA permission in the database set in the dbt project, you can create a development schema and a production schema for each dbt user, and create tables and views for each schema.
  • dbt_aaa_select_role
    • A role that refers to the data mart created by the dbt project "aaa".
    • By making the schema that this role can refer to only the schema for the production environment, the users of this role cannot see the development schema by dbt.
    • By making it a child role of dbt_aaa_dev_role, you can avoid multiple management of USAGE and SELECT privileges.
  • After that, every time we create another dbt project, we will create a role with the naming rules dbt_bbb_dev_role, dbt_bbb_select_role, and so on.

*For details on the required privilege in this role configuration, see the sample query at the end of this article.

Concerns with this configuration

Some users may want to JOIN tables in databases created by multiple dbt projects.

Since the dbt project is paired with a single database, there will be situations where you will need a role for users who want to JOIN and SELECT tables in different databases generated from multiple dbt projects.

I think there are three main ways to deal with this problem.

  • Using secondary roles
    • However, it is necessary to check in advance if it is safe to join the target databases together.(Like this article's example.)
  • Create a parent role that has a role as a child that can refer to databases created by multiple dbt projects with SELECT.
    • You need to be careful because there is a risk of increasing the number of roles as the number of dbt projects increases, but this is a good way to go if you are in a situation where you cannot use secondary roles.
  • In one dbt project, define a new model that defines JOIN with a table created in another dbt project.
    • In my opinion, this method is a bad practice.
    • The reason for this is that while developing with dbt allows data lineage to be automatically built and traced back to the original table, quoting a table generated in a different project makes it impossible to trace the original table within a single project, and the advantage of using dbt is lost.

Reference: Sample query to create the role configuration of this article.

  • Database with the original data:aaa_db
  • Development database for dbt:dbt_aaa_db
  • The schema where the production data created by dbt will be placed:production
    • In the deployment settings of dbt, you need to set this schema as the deployment destination schema
-- Create the necessary roles.

use role securityadmin;
create role dbt_admin_role;
create role dbt_aaa_dev_role;
create role dbt_aaa_select_role;

-- Make the created role a tree structure from sysadmin.

grant role dbt_admin_role to role sysadmin;
grant role dbt_aaa_dev_role to role dbt_admin_role;
grant role dbt_aaa_select_role to role dbt_aaa_dev_role;

-- Create a warehouse for job execution in dbt

use role sysadmin;

create or replace warehouse dbt_wh with
    warehouse_size = 'XSMALL'
    warehouse_type = 'STANDARD'
    auto_suspend = 60
    auto_resume = true
    min_cluster_count = 1
    max_cluster_count = 1
    scaling_policy = 'STANDARD'
    initially_suspended = TRUE;

grant usage on warehouse dbt_wh to role dbt_aaa_dev_role;

-- Create a warehouse for analysis by issuing queries against the data mart created by dbt * This is optional. You can use an existing warehouse for analysis.
use role sysadmin;

create or replace warehouse analysys_wh with
    warehouse_size = 'XSMALL'
    warehouse_type = 'STANDARD'
    auto_suspend = 60
    auto_resume = true
    min_cluster_count = 1
    max_cluster_count = 1
    scaling_policy = 'STANDARD'
    initially_suspended = TRUE;

grant usage on warehouse analysys_wh to role dbt_aaa_select_role;

-- Define the required database.

use role sysadmin;
create or replace database aaa_db clone citibike; -- A database with the original data that dbt refers to. Since this is a sample, it is newly created using clone.
create database dbt_aaa_db; -- A database that corresponds to the "dbt development and production environment" in which the development and production schemas for dbt are placed.

-- Move the ownership of the databases that correspond to dbt's development and production environments so that dbt_admin_role can manage them.

use role sysadmin;
grant ownership on database dbt_aaa_db to role dbt_admin_role;

-- Grant the necessary privileges to the dbt development role to enable development on dbt for databases that are part of the dbt development and production environment.

use role dbt_admin_role;
grant create schema on database dbt_aaa_db to role dbt_aaa_dev_role;
grant usage on database dbt_aaa_db to role dbt_aaa_select_role;

-- Create a production schema in advance for the database that corresponds to the dbt development and production environments (in order to grant SELECT privileges in advance).

use role dbt_aaa_dev_role;
create schema production;

-- Grant USAGE privilege to the production schema where dbt-created tables are stored so that the SELECT role can refer to them.

use role dbt_aaa_dev_role;
grant usage on schema dbt_aaa_db.production to role dbt_aaa_select_role;

-- Enable the SELECT role to SELECT all future tables and views created in the production schema against the production schema where dbt-created tables are stored.
use role securityadmin;
grant select on future tables in schema dbt_aaa_db.production to role dbt_aaa_select_role;
grant select on future views in schema dbt_aaa_db.production to role dbt_aaa_select_role;

-- Grant USAGE privilege to the database and schema with the original data referenced by dbt, and SELECT privilege to all tables.
use role sysadmin;
grant usage on database aaa_db to role dbt_aaa_dev_role;
grant usage on schema aaa_db.public to role dbt_aaa_dev_role;
grant select on all tables in schema aaa_db.public to role dbt_aaa_dev_role;

use role securityadmin;
grant usage on future schemas in database aaa_db to role dbt_aaa_dev_role;
grant select on future tables in schema aaa_db.public to role dbt_aaa_dev_role;
grant select on future views in schema aaa_db.public to role dbt_aaa_dev_role;