[新機能]dbtのdata testで任意の設定をconfigで指定できるようになり、各data testでSnowflakeのウェアハウスを指定できるようになりました
さがらです。
dbt-coreのv1.9.0b1で、Update data_test to accept arbitrary config options
というアップデートがありました。
関連するIssueは下記となります。
このアップデートにより、dbtのdata testで任意の設定をconfigで指定できるようになり、具体例をあげると各data testでSnowflakeのウェアハウスを指定できるようになりました。
実際に試してみたので、本記事でまとめてみます。
やってみた
設定方法はシンプルで、下記のように各date_tests
のテストの中でconfig
を追加し、snowflake_warehouse
で使いたいウェアハウスを指定するだけです。
version: 2
models:
- name: customers
columns:
- name: customer_id
description: Primary key
data_tests:
- unique
- not_null:
config:
snowflake_warehouse: sagara_analysys_wh
今回はdbt Cloudで検証したのですが、デフォルトのウェアハウスは下図のようにsagara_dbt_dev_wh
としています。この状態で上述のようにnot_nullのテストでsagara_analysys_wh
を指定した時にどのような挙動となるかを見てみます。
この上でdbt build
を実行してみます。config
の指定の有無で、下記のようにログが異なっているのがわかると思います。config
でsnowflake_warehouse
を指定した場合はselect current_warehouse() as warehouse
とuse warehouse sagara_analysys_wh
が追加で実行されています。
config
でsnowflake_warehouse
を指定したテスト
21:12:34 Began running node test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d
21:12:34 13 of 14 START test not_null_customers_customer_id ............................. [RUN]
21:12:34 Began compiling node test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d
21:12:34 Writing injected SQL for node "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"
21:12:34 Began executing node test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d
21:12:34 Using snowflake connection "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"
21:12:34 On test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"} */
select current_warehouse() as warehouse
21:12:35 SQL status: SUCCESS 1 in 0.213 seconds
21:12:35 Using snowflake connection "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"
21:12:35 On test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"} */
use warehouse sagara_analysys_wh
21:12:35 SQL status: SUCCESS 1 in 0.199 seconds
21:12:35 Writing runtime sql for node "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"
21:12:35 Using snowflake connection "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"
21:12:35 On test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"} */
select
count(*) as failures,
count(*) != 0 as should_warn,
count(*) != 0 as should_error
from (
select customer_id
from sagara_dbt_test_db.dbt_ssagara.customers
where customer_id is null
) dbt_internal_test
21:12:35 SQL status: SUCCESS 1 in 0.613 seconds
21:12:35 Using snowflake connection "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"
21:12:35 On test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"} */
use warehouse SAGARA_DBT_DEV_WH
21:12:36 SQL status: SUCCESS 1 in 0.213 seconds
21:12:36 13 of 14 PASS not_null_customers_customer_id ................................... [[32mPASS[0m in 1.26s]
21:12:36 Finished running node test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d
config
でsnowflake_warehouse
を指定していないテスト
21:12:32 Began running node test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64
21:12:32 10 of 14 START test not_null_stg_orders_order_id ............................... [RUN]
21:12:32 Began compiling node test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64
21:12:32 Writing injected SQL for node "test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64"
21:12:32 Began executing node test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64
21:12:32 Writing runtime sql for node "test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64"
21:12:32 Using snowflake connection "test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64"
21:12:32 On test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64"} */
select
count(*) as failures,
count(*) != 0 as should_warn,
count(*) != 0 as should_error
from (
select order_id
from sagara_dbt_test_db.dbt_ssagara.stg_orders
where order_id is null
) dbt_internal_test
21:12:33 SQL status: SUCCESS 1 in 0.627 seconds
21:12:33 10 of 14 PASS not_null_stg_orders_order_id ..................................... [[32mPASS[0m in 0.67s]
21:12:33 Finished running node test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64
注意事項
このアップデートには1つ注意点があり、ModelのConfigとしてウェアハウスを指定した場合は、Modelの実行時は指定したウェアハウスを使いますが、テスト実行時にはデフォルトのウェアハウスが動いてしまいます。
この事象については、下記の記事の「現状も残る課題」の内容が参考になります。
例えば、Modelに関するyamlの内容とdbt_project.yml
の内容を下記のようにしたとします。
- Modelに関するyaml
version: 2
models:
- name: customers
columns:
- name: customer_id
description: Primary key
data_tests:
- unique
- not_null
dbt_project.yml
※models
のみ抜粋
models:
jaffle_shop:
snowflake_warehouse: sagara_analysys_wh
staging:
materialized: view
marts:
materialized: table
この状態で実行すると、下記のようなログが流れます。customers.sql
のModel実行のときだけ、select current_warehouse() as warehouse
とuse warehouse sagara_analysys_wh
が実行されているのがわかると思います。
customers.sql
のModel実行
21:22:54 Began running node model.jaffle_shop.customers
21:22:54 12 of 14 START sql table model dbt_ssagara.customers ........................... [RUN]
21:22:54 Began compiling node model.jaffle_shop.customers
21:22:54 Writing injected SQL for node "model.jaffle_shop.customers"
21:22:54 Began executing node model.jaffle_shop.customers
21:22:54 Using snowflake connection "model.jaffle_shop.customers"
21:22:54 On model.jaffle_shop.customers: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "model.jaffle_shop.customers"} */
select current_warehouse() as warehouse
21:22:54 SQL status: SUCCESS 1 in 0.205 seconds
21:22:54 Using snowflake connection "model.jaffle_shop.customers"
21:22:54 On model.jaffle_shop.customers: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "model.jaffle_shop.customers"} */
use warehouse sagara_analysys_wh
21:22:54 SQL status: SUCCESS 1 in 0.232 seconds
21:22:54 Writing runtime sql for node "model.jaffle_shop.customers"
21:22:54 Using snowflake connection "model.jaffle_shop.customers"
21:22:54 On model.jaffle_shop.customers: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "model.jaffle_shop.customers"} */
create or replace transient table sagara_dbt_test_db.dbt_ssagara.customers
as
(with
customers as (select * from sagara_dbt_test_db.dbt_ssagara.stg_customers),
orders as (select * from sagara_dbt_test_db.dbt_ssagara.stg_orders),
customer_orders as (
select
customer_id,
min(order_date) as first_order_date,
max(order_date) as most_recent_order_date,
count(order_id) as number_of_orders
from orders
group by 1
),
final as (
select
customers.customer_id,
customers.first_name,
customers.last_name,
customer_orders.first_order_date,
customer_orders.most_recent_order_date,
coalesce(customer_orders.number_of_orders, 0) as number_of_orders
from customers
left join
customer_orders
on customers.customer_id = customer_orders.customer_id
)
select *
from final
);
21:22:56 SQL status: SUCCESS 1 in 1.079 seconds
21:22:56 Using snowflake connection "model.jaffle_shop.customers"
21:22:56 On model.jaffle_shop.customers: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "model.jaffle_shop.customers"} */
use warehouse SAGARA_DBT_DEV_WH
21:22:56 SQL status: SUCCESS 1 in 0.194 seconds
21:22:56 12 of 14 OK created sql table model dbt_ssagara.customers ...................... [[32mSUCCESS 1[0m in 1.73s]
21:22:56 Finished running node model.jaffle_shop.customers
customers
のcustomer_id
列に対するunique
テスト
21:22:56 Began running node test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1
21:22:56 14 of 14 START test unique_customers_customer_id ............................... [RUN]
21:22:56 Began compiling node test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1
21:22:56 Writing injected SQL for node "test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1"
21:22:56 Began executing node test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1
21:22:56 Writing runtime sql for node "test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1"
21:22:56 Using snowflake connection "test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1"
21:22:56 On test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1"} */
select
count(*) as failures,
count(*) != 0 as should_warn,
count(*) != 0 as should_error
from (
select
customer_id as unique_field,
count(*) as n_records
from sagara_dbt_test_db.dbt_ssagara.customers
where customer_id is not null
group by customer_id
having count(*) > 1
) dbt_internal_test
21:22:56 SQL status: SUCCESS 1 in 0.355 seconds
21:22:56 14 of 14 PASS unique_customers_customer_id ..................................... [[32mPASS[0m in 0.38s]
21:22:56 Finished running node test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1
最後に
dbtのdata testで任意の設定をconfigで指定できるようになったので、各data testでSnowflakeのウェアハウスを指定することを試してみました。
記事上で述べた注意点はありますが、各テストで違うウェアハウスを使用したいケースはあると思いますので、その時にはぜひご活用ください!