[新機能]dbtのdata testで任意の設定をconfigで指定できるようになり、各data testでSnowflakeのウェアハウスを指定できるようになりました

[新機能]dbtのdata testで任意の設定をconfigで指定できるようになり、各data testでSnowflakeのウェアハウスを指定できるようになりました

Clock Icon2024.11.15

さがらです。

dbt-coreのv1.9.0b1で、Update data_test to accept arbitrary config optionsというアップデートがありました。

https://github.com/dbt-labs/dbt-core/releases/tag/v1.9.0b1

関連するIssueは下記となります。

https://github.com/dbt-labs/dbt-core/issues/10197

このアップデートにより、dbtのdata testで任意の設定をconfigで指定できるようになり、具体例をあげると各data testでSnowflakeのウェアハウスを指定できるようになりました。

実際に試してみたので、本記事でまとめてみます。

やってみた

設定方法はシンプルで、下記のように各date_testsのテストの中でconfigを追加し、snowflake_warehouseで使いたいウェアハウスを指定するだけです。

version: 2

models:
  - name: customers
    columns:
      - name: customer_id
        description: Primary key
        data_tests:
          - unique
          - not_null:
              config:
                snowflake_warehouse: sagara_analysys_wh

今回はdbt Cloudで検証したのですが、デフォルトのウェアハウスは下図のようにsagara_dbt_dev_whとしています。この状態で上述のようにnot_nullのテストでsagara_analysys_whを指定した時にどのような挙動となるかを見てみます。

2024-11-15_06h07_27

この上でdbt buildを実行してみます。configの指定の有無で、下記のようにログが異なっているのがわかると思います。configsnowflake_warehouseを指定した場合はselect current_warehouse() as warehouseuse warehouse sagara_analysys_whが追加で実行されています。

  • configsnowflake_warehouseを指定したテスト
21:12:34 Began running node test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d
21:12:34 13 of 14 START test not_null_customers_customer_id ............................. [RUN]
21:12:34 Began compiling node test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d
21:12:34 Writing injected SQL for node "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"
21:12:34 Began executing node test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d
21:12:34 Using snowflake connection "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"
21:12:34 On test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"} */
select current_warehouse() as warehouse
21:12:35 SQL status: SUCCESS 1 in 0.213 seconds
21:12:35 Using snowflake connection "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"
21:12:35 On test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"} */
use warehouse sagara_analysys_wh
21:12:35 SQL status: SUCCESS 1 in 0.199 seconds
21:12:35 Writing runtime sql for node "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"
21:12:35 Using snowflake connection "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"
21:12:35 On test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"} */
select
      count(*) as failures,
      count(*) != 0 as should_warn,
      count(*) != 0 as should_error
    from (

select customer_id
from sagara_dbt_test_db.dbt_ssagara.customers
where customer_id is null

    ) dbt_internal_test
21:12:35 SQL status: SUCCESS 1 in 0.613 seconds
21:12:35 Using snowflake connection "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"
21:12:35 On test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d"} */
use warehouse SAGARA_DBT_DEV_WH
21:12:36 SQL status: SUCCESS 1 in 0.213 seconds
21:12:36 13 of 14 PASS not_null_customers_customer_id ................................... [PASS in 1.26s]
21:12:36 Finished running node test.jaffle_shop.not_null_customers_customer_id.5c9bf9911d

  • configsnowflake_warehouseを指定していないテスト
21:12:32 Began running node test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64
21:12:32 10 of 14 START test not_null_stg_orders_order_id ............................... [RUN]
21:12:32 Began compiling node test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64
21:12:32 Writing injected SQL for node "test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64"
21:12:32 Began executing node test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64
21:12:32 Writing runtime sql for node "test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64"
21:12:32 Using snowflake connection "test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64"
21:12:32 On test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64"} */
select
      count(*) as failures,
      count(*) != 0 as should_warn,
      count(*) != 0 as should_error
    from (

select order_id
from sagara_dbt_test_db.dbt_ssagara.stg_orders
where order_id is null

    ) dbt_internal_test
21:12:33 SQL status: SUCCESS 1 in 0.627 seconds
21:12:33 10 of 14 PASS not_null_stg_orders_order_id ..................................... [PASS in 0.67s]
21:12:33 Finished running node test.jaffle_shop.not_null_stg_orders_order_id.81cfe2fe64

注意事項

このアップデートには1つ注意点があり、ModelのConfigとしてウェアハウスを指定した場合は、Modelの実行時は指定したウェアハウスを使いますが、テスト実行時にはデフォルトのウェアハウスが動いてしまいます。

この事象については、下記の記事の「現状も残る課題」の内容が参考になります。

https://zenn.dev/pei0804/articles/dbt-snowflake-dynamic-warehouse#現状も残る課題

例えば、Modelに関するyamlの内容とdbt_project.ymlの内容を下記のようにしたとします。

  • Modelに関するyaml
version: 2

models:
  - name: customers
    columns:
      - name: customer_id
        description: Primary key
        data_tests:
          - unique
          - not_null
  • dbt_project.yml ※modelsのみ抜粋
models:
  jaffle_shop:
    snowflake_warehouse: sagara_analysys_wh
    staging:
      materialized: view
    marts:
      materialized: table

この状態で実行すると、下記のようなログが流れます。customers.sqlのModel実行のときだけ、select current_warehouse() as warehouseuse warehouse sagara_analysys_whが実行されているのがわかると思います。

  • customers.sqlのModel実行
21:22:54 Began running node model.jaffle_shop.customers
21:22:54 12 of 14 START sql table model dbt_ssagara.customers ........................... [RUN]
21:22:54 Began compiling node model.jaffle_shop.customers
21:22:54 Writing injected SQL for node "model.jaffle_shop.customers"
21:22:54 Began executing node model.jaffle_shop.customers
21:22:54 Using snowflake connection "model.jaffle_shop.customers"
21:22:54 On model.jaffle_shop.customers: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "model.jaffle_shop.customers"} */
select current_warehouse() as warehouse
21:22:54 SQL status: SUCCESS 1 in 0.205 seconds
21:22:54 Using snowflake connection "model.jaffle_shop.customers"
21:22:54 On model.jaffle_shop.customers: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "model.jaffle_shop.customers"} */
use warehouse sagara_analysys_wh
21:22:54 SQL status: SUCCESS 1 in 0.232 seconds
21:22:54 Writing runtime sql for node "model.jaffle_shop.customers"
21:22:54 Using snowflake connection "model.jaffle_shop.customers"
21:22:54 On model.jaffle_shop.customers: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "model.jaffle_shop.customers"} */
create or replace transient table sagara_dbt_test_db.dbt_ssagara.customers
         as
        (with
customers as (select * from sagara_dbt_test_db.dbt_ssagara.stg_customers),

orders as (select * from sagara_dbt_test_db.dbt_ssagara.stg_orders),

customer_orders as (
    select
        customer_id,
        min(order_date) as first_order_date,
        max(order_date) as most_recent_order_date,
        count(order_id) as number_of_orders
    from orders
    group by 1
),

final as (
    select
        customers.customer_id,
        customers.first_name,
        customers.last_name,
        customer_orders.first_order_date,
        customer_orders.most_recent_order_date,
        coalesce(customer_orders.number_of_orders, 0) as number_of_orders
    from customers
    left join
        customer_orders
        on customers.customer_id = customer_orders.customer_id
)

select *
from final
        );
21:22:56 SQL status: SUCCESS 1 in 1.079 seconds
21:22:56 Using snowflake connection "model.jaffle_shop.customers"
21:22:56 On model.jaffle_shop.customers: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "model.jaffle_shop.customers"} */
use warehouse SAGARA_DBT_DEV_WH
21:22:56 SQL status: SUCCESS 1 in 0.194 seconds
21:22:56 12 of 14 OK created sql table model dbt_ssagara.customers ...................... [SUCCESS 1 in 1.73s]
21:22:56 Finished running node model.jaffle_shop.customers
  • customerscustomer_id列に対するuniqueテスト
21:22:56 Began running node test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1
21:22:56 14 of 14 START test unique_customers_customer_id ............................... [RUN]
21:22:56 Began compiling node test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1
21:22:56 Writing injected SQL for node "test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1"
21:22:56 Began executing node test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1
21:22:56 Writing runtime sql for node "test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1"
21:22:56 Using snowflake connection "test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1"
21:22:56 On test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1: /* {"app": "dbt", "dbt_version": "2024.11.14+16d4939", "profile_name": "user", "target_name": "dev", "node_id": "test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1"} */
select
      count(*) as failures,
      count(*) != 0 as should_warn,
      count(*) != 0 as should_error
    from (

select
    customer_id as unique_field,
    count(*) as n_records

from sagara_dbt_test_db.dbt_ssagara.customers
where customer_id is not null
group by customer_id
having count(*) > 1

    ) dbt_internal_test
21:22:56 SQL status: SUCCESS 1 in 0.355 seconds
21:22:56 14 of 14 PASS unique_customers_customer_id ..................................... [PASS in 0.38s]
21:22:56 Finished running node test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1

最後に

dbtのdata testで任意の設定をconfigで指定できるようになったので、各data testでSnowflakeのウェアハウスを指定することを試してみました。

記事上で述べた注意点はありますが、各テストで違うウェアハウスを使用したいケースはあると思いますので、その時にはぜひご活用ください!

Share this article

facebook logohatena logotwitter logo

© Classmethod, Inc. All rights reserved.