Comparing Directory Structure Patterns for Snowflake × Terraform
Introduction
I am kasama from the Data Business Division. In this article, I would like to summarize the points I considered regarding the directory structure when managing development/production Snowflake environments with Terraform.## Premise
As a premise, this is about the open-source version of Terraform, and HCP Terraform (cloud version) is not considered.
I will reference the structure from the following blogs and customize them slightly for my own use.
I would like to proceed based on the following directory structure. This implementation creates modules by access role + resource unit, and calls them from main.tf for dev/prd environments. Terraform init, plan, and apply commands are executed via GitHub Actions.
snowflake-terraform-sample % tree
.
├── .github/
│ └── workflows/
│ ├── dev-snowflake-terraform-cicd.yml
│ └── prd-snowflake-terraform-cicd.yml
├── cfn
│ └── create_state_gha_resources.yml
├── environments
│ ├── dev
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ └── prd
│ ├── backend.tf
│ ├── main.tf
│ ├── outputs.tf
│ ├── variables.tf
│ └── versions.tf
├── modules
│ ├── access_role_and_database
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── access_role_and_file_format
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── access_role_and_schema
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── access_role_and_stage
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── access_role_and_storage_integration
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── access_role_and_warehouse
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── aws_storage_integration
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── functional_role
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
└── README.md
```## Whether to separate directories for dev/prd
When having multiple environments like dev/prd, there's an implementation approach that doesn't separate directory structures.
Since I haven't actually verified this, I'm writing based on assumptions about how it could be implemented, but I think it's possible to create environment-specific backend.tf files or tfvars files and configure CI/CD to specify arguments when executing terraform commands, enabling an implementation that follows the DRY principle without duplicating main.tf.
snowflake-terraform/
├── .github/
│ └── workflows/
│ ├── dev-snowflake-terraform-cicd.yml
│ └── prd-snowflake-terraform-cicd.yml
├── cfn
│ └── create_state_gha_resources.yml
├── terraform/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── versions.tf
│ ├── backends/ # Environment-specific backend settings
│ │ ├── dev-backend.tf
│ │ └── prd-backend.tf
│ ├── environments/ # Environment-specific variable values
│ │ ├── dev.tfvars
│ │ └── prd.tfvars
├── modules
│ ├── :
### Comparison
Here's a comparison of the approaches.
| Comparison Item | Environment-Specific Directory Structure | Variable Separation Structure (DRY principle) |
|---------|----------------------|-------------------|
| **CI/CD Configuration** | ⭕️ Specify directories per environment<br>Simple configuration | ⚠️ Need to specify `-backend-config` when running terraform init, and `-var-file` for terraform plan/apply commands |
| **Environment Differences** | ⭕️ Create main.tf in each directory calling modules | ⭕️ Absorb differences with environment-specific tfvars files |
| **Maintainability** | ⚠️ Low<br>Changes need to be reflected in all environments | ⭕️ High<br>Changes complete in one place |
| **Risk of Operational Errors** | ⭕️ Very low<br>Physically separated | ⚠️ Low<br>Controlled by CI/CD execution |
| **Recommended Cases** | When environment differences are significant<br>Generally recommended structure | When environment differences are small<br>When prioritizing DRY principle |
While Hashicorp and Google Cloud best practices recommend environment-specific directory structures, it's important to choose the appropriate structure based on your project's situation. Particularly, environment-specific directory structures are effective when environment differences are significant or when simplicity of operation is prioritized. For Snowflake, you might have different warehouse sizes for dev/prd or different Network policy settings, so the environment-specific configuration seems good for simplicity. As you continue operations, you can consider a variable separation structure if environment differences become smaller and the disadvantages of implementation duplication become significant.
Referenced the following articles:
https://developer.hashicorp.com/terraform/language/style#module-structure
https://cloud.google.com/docs/terraform/best-practices/root-modules?hl=ja
https://zenn.dev/sway/articles/terraform_style_envcomparisontable
https://medium.com/snowflake/so-you-want-to-terraform-snowflake-a6d16ca3237e## Module Structure: Resource Type vs User Type
The current structure separates modules by resource type (database, file_format, schema, etc.), but alternatively we could organize resources by user type as shown below:
```txt
snowflake-terraform/
├── .github/
│ └── workflows/
│ ├── dev-snowflake-terraform-cicd.yml
│ └── prd-snowflake-terraform-cicd.yml
├── cfn/
│ └── create_state_gha_resources.yml
├── environments/
│ ├── dev/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ └── prd/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
├── modules/
│ ├── admin/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── versions.tf
│ ├── analyst/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── versions.tf
│ ├── data_engineer/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── versions.tf
│ ├── aws_storage_integration/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── versions.tf
│ └── network_policy/
│ ├── main.tf
│ ├── variables.tf
│ └── versions.tf
This implementation defines resources in main.tf for each user type and calls them from environments/<environment>/main.tf
.
resource "snowflake_database" "main" {
:
}
resource "snowflake_schema" "main" {
:
}
resource "snowflake_warehouse" "main" {
:
}
resource "snowflake_database" "main" {
:
}
resource "snowflake_schema" "main" {
:
}
resource "snowflake_warehouse" "main" {
:
}
module "sample_admin" {
source = "../../modules/admin"
:
}
module "sample_analyst" {
source = "../../modules/analyst"
:
}
```### Comparison
I compared each approach.
| Comparison Item | Per Resource Type | Per User Type |
|---------|-----------------|-----------------|
| **Code Duplication** | ⭕️ Implementation of each resource is in one place<br>Adheres to DRY principle | ⚠️ Same resource definitions scattered across multiple modules<br>(e.g., warehouse duplicated for each role) |
| **Adding New Resources** | ⚠️ Requires calling multiple modules<br>(database + schema + warehouse, etc.) | ⭕️ Complete with a single module call<br>Resources needed for the role are created at once |
| **Modifying Existing Resources** | ⭕️ Changes in one place reflect across the system<br>(e.g., warehouse settings changed globally) | ⚠️ Same change needs to be applied to multiple modules<br>(e.g., warehouse settings modified individually for each role) |
| **Dependency Management** | ⚠️ Increases outputs/inputs wiring between modules<br>(e.g., passing database.id to schema module) | ⭕️ Tends to be self-contained within roles |
| **Learning Curve** | ⚠️ Requires understanding of module wiring | ⭕️ More intuitive to grasp based on "people's roles" |
| **Recommended Cases** | When resource reusability is prioritized<br>Generally recommended structure | When resource patterns for each role are fixed<br>When simplicity of new additions is prioritized |
Generally, managing by `Resource Type` aligns with the DRY principle and reduces code duplication. However, depending on the project situation or if resource patterns are fixed for each role, managing by `User Type` should also be considered as an option. The `User Type` approach may be appropriate when resource patterns created for each role are fixed and when you want to easily add many instances of the same user type (like admin_1, admin_2...admin_50).### Hybrid Configuration
As a configuration that takes advantage of both approaches, you can call modules for `each user type` from `environments/<environment>/main.tf`, and within the `user type` module, call modules for `each resource type`.
```txt
snowflake-terraform-sample/
├── .github/
│ └── workflows/
│ ├── dev-snowflake-terraform-cicd.yml
│ └── prd-snowflake-terraform-cicd.yml
├── cfn/
│ └── create_state_gha_resources.yml
├── environments/
│ ├── dev/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ └── prd/
│ ├── backend.tf
│ ├── main.tf
│ ├── outputs.tf
│ ├── variables.tf
│ └── versions.tf
├── modules/
│ ├── base_resources/ # Base resource modules (DRY principle)
│ │ ├── access_role_and_database/
│ │ ├── access_role_and_file_format/
│ │ ├── access_role_and_schema/
│ │ ├── functional_role/
│ │ └── ... (other resources)
│ └── user_types/ # Modules for each user type
│ ├── admin/
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── data_engineer/
│ └── analyst/
└── README.md
# Resources for Admin
module "project_a_admin" {
source = "../../modules/user_types/admin"
project_name = "PROJECT_A"
environment = "DEV"
warehouse_size = "XSMALL"
data_retention_time_in_days = 1
providers = {
snowflake.sysadmin = snowflake.sysadmin
snowflake.securityadmin = snowflake.securityadmin
}
}
# Easily add multiple admin users if needed
module "project_b_admin" {
source = "../../modules/user_types/admin"
project_name = "PROJECT_B"
environment = "DEV"
warehouse_size = "XSMALL"
data_retention_time_in_days = 1
providers = {
snowflake.sysadmin = snowflake.sysadmin
snowflake.securityadmin = snowflake.securityadmin
}
}
locals {
prefix = upper("${var.project_name}_${var.environment}_ADMIN")
}
# Create Functional Role
module "functional_role" {
source = "../../base_resources/functional_role"
role_name = "${local.prefix}_ROLE"
comment = "Admin functional role for ${var.project_name}"
}
# Create Database
module "database" {
source = "../../base_resources/access_role_and_database"
database_name = "${local.prefix}_DB"
comment = "Admin database for ${var.project_name}"
data_retention_time_in_days = var.data_retention_time_in_days
access_role_name = "${local.prefix}_DB_ACCESS_ROLE"
access_role_comment = "Access role for admin database"
usage_grants = [module.functional_role.role_name]
create_schema_grants = [module.functional_role.role_name]
monitor_grants = [module.functional_role.role_name]
}
# Create Schema
module "schema_public" {
source = "../../base_resources/access_role_and_schema"
database_name = module.database.database_name
schema_name = "PUBLIC"
comment = "Public schema for admin"
access_role_name = "${local.prefix}_PUBLIC_ACCESS_ROLE"
access_role_comment = "Access role for public schema"
usage_grants = [module.functional_role.role_name]
create_table_grants = [module.functional_role.role_name]
}
# Create File Format
module "file_format_csv" {
source = "../../base_resources/access_role_and_file_format"
database_name = module.database.database_name
schema_name = module.schema_public.schema_name
file_format_name = "${local.prefix}_CSV_FORMAT"
format_type = "CSV"
comment = "CSV file format for admin"
access_role_name = "${local.prefix}_CSV_FORMAT_ACCESS_ROLE"
access_role_comment = "Access role for CSV file format"
usage_grants = [module.functional_role.role_name]
}
```The module structure becomes a bit complex, but it can achieve both advantages: "easy addition of many users of the same type like admin_1, admin_2...admin_50" by user type, and "when resource changes are needed, modifications in one place reflect across the entire system" by resource type.
## Directory structure when resources increase
In our previous implementation, we called modules from a single file `environments/<environment>/main.tf`. While this structure works well with few resources, managing a single file becomes quite challenging when resources increase to 100 or 200. I'd like to summarize several potential solutions.
### File separation by resource type
```txt
snowflake-terraform-sample % tree
.
├── .github/
│ └── workflows/
│ ├── dev-snowflake-terraform-cicd.yml
│ └── prd-snowflake-terraform-cicd.yml
├── cfn
│ └── create_state_gha_resources.yml
├── environments
│ ├── dev
│ │ ├── backend.tf
│ │ ├── main.tf → Common processing
│ │ ├── databases.tf → Resource-specific
│ │ ├── file_formats.tf → Resource-specific
│ │ ├── schemas.tf → Resource-specific
│ │ ├── ... (other resources)
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ └── prd
│ ├── backend.tf
│ ├── main.tf → Common processing
│ ├── databases.tf → Resource-specific
│ ├── file_formats.tf → Resource-specific
│ ├── schemas.tf → Resource-specific
│ ├── ... (other resources)
│ ├── outputs.tf
│ ├── variables.tf
│ └── versions.tf
├── modules/
│ ├── access_role_and_database/
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── :
This pattern creates separate tf files for each resource type, like databases.tf
and file_formats.tf
. If the code volume increases within databases
, you could further split into databases_a.tf
, databases_b.tf
, etc. However, excessive file splitting can create its own management challenges by having too many files in the environment folder.### Subdirectory Structure Using YAML Files
snowflake-terraform-sample/
├── .github/
│ └── workflows/
│ ├── dev-snowflake-terraform-cicd.yml
│ └── prd-snowflake-terraform-cicd.yml
├── cfn/
│ └── create_state_gha_resources.yml
├── environments/
│ ├── dev/
│ │ ├── backend.tf
│ │ ├── main.tf # Loads YAML and calls modules
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ ├── versions.tf
│ │ └── yml/ # YAML definitions
│ │ ├── databases/
│ │ │ ├── database_1.yml
│ │ │ ├── database_2.yml
│ │ ├── file_formats/
│ │ │ ├── file_format_1.yml
│ │ │ ├── file_format_2.yml
│ │ ├── schemas/
│ │ │ ├── schema_1.yml
│ │ │ ├── schema_2.yml
│ │ └── ... (other resources)
│ └── prd/
│ ├── backend.tf
│ ├── main.tf # Loads YAML and calls modules
│ ├── outputs.tf
│ ├── variables.tf
│ ├── versions.tf
│ └── yml/ # Resource definitions
│ ├── databases/
│ │ ├── database_1.yml
│ │ ├── database_2.yml
│ ├── file_formats/
│ │ ├── file_format_1.yml
│ │ ├── file_format_2.yml
│ ├── schemas/
│ │ ├── schema_1.yml
│ │ ├── schema_2.yml
│ └── ... (other resources)
├── modules/
│ ├── access_role_and_database/
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── :
This is an implementation that defines parameters in YAML files and references them in environments/<environment>/main.tf
.
name: DATABASE_1_DEV
comment: "Development database 1"
data_retention_time_in_days: 1
access_role:
name: DATABASE_1_DEV_ACCESS_ROLE
comment: "Access role for database 1"
grants:
usage:
- ADMIN_ROLE_1
- SYSADMIN
create_schema:
- ADMIN_ROLE_1
monitor:
- ADMIN_ROLE_1
# ============================================================
# Loading YAML files
# ============================================================
locals {
# Decode each YAML file
database_configs = {
for f in fileset("${path.module}/yml/databases", "*.yml") :
trimsuffix(f, ".yml") => yamldecode(file("${path.module}/yml/databases/${f}"))
}
file_format_configs = {
for f in fileset("${path.module}/yml/file_formats", "*.yml") :
trimsuffix(f, ".yml") => yamldecode(file("${path.module}/yml/file_formats/${f}"))
}
schema_configs = {
for f in fileset("${path.module}/yml/schemas", "*.yml") :
trimsuffix(f, ".yml") => yamldecode(file("${path.module}/yml/schemas/${f}"))
}
}
# ============================================================
# Database module calls
# ============================================================
module "database_1" {
source = "../../modules/access_role_and_database"
database_name = local.database_configs["database_1"].name
comment = local.database_configs["database_1"].comment
data_retention_time_in_days = local.database_configs["database_1"].data_retention_time_in_days
access_role_name = local.database_configs["database_1"].access_role.name
access_role_comment = local.database_configs["database_1"].access_role.comment
usage_grants = local.database_configs["database_1"].grants.usage
create_schema_grants = local.database_configs["database_1"].grants.create_schema
monitor_grants = local.database_configs["database_1"].grants.monitor
providers = {
snowflake.sysadmin = snowflake.sysadmin
snowflake.securityadmin = snowflake.securityadmin
}
}
module "database_2" {
source = "../../modules/access_role_and_database"
database_name = local.database_configs["database_2"].name
comment = local.database_configs["database_2"].comment
data_retention_time_in_days = local.database_configs["database_2"].data_retention_time_in_days
access_role_name = local.database_configs["database_2"].access_role.name
access_role_comment = local.database_configs["database_2"].access_role.comment
usage_grants = local.database_configs["database_2"].grants.usage
create_schema_grants = local.database_configs["database_2"].grants.create_schema
monitor_grants = try(local.database_configs["database_2"].grants.monitor, [])
providers = {
snowflake.sysadmin = snowflake.sysadmin
snowflake.securityadmin = snowflake.securityadmin
}
}
```By using YAML, you can reduce the amount of code in main.tf while maintaining a subdirectory structure. The disadvantages include that YAML syntax errors and data type mismatches cannot be discovered until running `terraform plan`, and `terraform validate` cannot validate values within YAML. Also, in YAML, dependencies are described as strings, making it harder to understand which resources are being depended on compared to explicit references in HCL (like module.database_1.name). These factors can make debugging and maintenance more complex.
The following article was referenced for the YAML-based configuration:
https://datumstudio.jp/blog/0131_terraform_snowflake_role_creating/### State Separation Configuration
```txt
snowflake-terraform-sample/
├── .github/
│ └── workflows/
│ ├── dev-snowflake-terraform-cicd.yml
│ └── prd-snowflake-terraform-cicd.yml
├── cfn/
│ └── create_state_gha_resources.yml
├── environments/
│ ├── dev/
│ ├── common/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── databases/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── file_formats/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ └── schemas/
│ ├── backend.tf
│ ├── main.tf
│ ├── outputs.tf
│ ├── variables.tf
│ └── versions.tf
│ └── prd/
│ ├── common/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── databases/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── file_formats/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ └── schemas/
│ ├── backend.tf
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
├── modules/
│ ├── access_role_and_database/
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── :
When using state separation, resources with dependencies such as databases and schemas need to be referenced using terraform_remote_state
.
output "database_1_name" {
value = module.database_1.database_name
}
# Loading another state file
data "terraform_remote_state" "databases" {
backend = "s3"
config = {
bucket = "terraform-state-bucket"
key = "snowflake/dev/databases/terraform.tfstate" # ← Different state file
region = "ap-northeast-1"
}
}
module "schema_1" {
source = "../../modules/access_role_and_schema"
# Get value via remote_state
database_name = data.terraform_remote_state.databases.outputs.database_1_name
schema_name = "PUBLIC"
}
```State separation allows for managing subdirectory structure using only tf files, but results in resource-by-resource state management, making dependency management more complex. Also, you need to consider deployment order (common → databases → schemas), and CI/CD pipeline configuration becomes more complicated.
### Comparison
Here's a comparison of each approach:
| Comparison Item | File Separation Structure | YAML Utilization Structure | State Separation Structure |
|---------|-----------------|-------------|--------------|
| **Directory Structure** | ⭕️ Flat<br>tf files placed directly under environment | ⭕️ Hierarchical<br>YAML organized in subdirectories | ⭕️ Hierarchical<br>Directories by resource type |
| **Code Volume per File** | ⚠️ Large<br>Each resource defined in one tf | ⭕️ Small<br>Defined in YAML, main.tf is concise | ⭕️ Small<br>Output and data references added |
| **State Management** | ⭕️ Single State | ⭕️ Single State | ⚠️ Multiple States (separated) |
| **Dependency Management** | ⭕️ Simple<br>`module.xxx` references within same state | ⚠️ Implicit<br>String references in YAML | ⚠️ Complex<br>`terraform_remote_state` required |
| **CI/CD Configuration** | ⭕️ Simple<br>Completed in one execution | ⭕️ Simple<br>Completed in one execution | ⚠️ Complex<br>Execution order control needed |
| **Performance** | ⭕️ Standard | ⭕️ Standard | ⭕️ High<br>Can execute only changed parts |
| **Learning Cost** | ⭕️ Low<br>Standard Terraform configuration | ⚠️ Medium<br>Understanding of custom YAML structure needed | ⚠️ High<br>Understanding of cross-state references required |
| **Recommended Cases** | Single state management with few resources | Single state management with desired subdirectory structure | When wanting subdirectory structure with tf files only |
My personal view is that it's best to start with a single-state file separation structure. This configuration is the simplest and follows Terraform's standard usage, keeping the learning cost low for team members. In the initial stages, file divisions by resource type like databases.tf and schemas.tf should be sufficient for management.
As the number of resources increases, considering migration to state separation structure becomes practical. While state separation makes dependency management more complex, it offers benefits like performance improvements and impact scope limitation in large-scale environments.
Regarding YAML utilization structure, I believe the disadvantages of debugging complexity outweigh the benefits.
In conclusion, I currently think the approach of "starting simple and gradually increasing complexity as needed" is best.
Terragrunt is another option. This tool specializes in multiple state management and is an effective choice when managing large-scale resources (e.g., 100+ resources) is anticipated from the start, or when the team has learning resources available.
https://zenn.dev/simpleform_blog/articles/20240701-multi-account-snowflake-with-terragrunt
## Finally
The content described above is merely my thinking at the time of writing and may change after years of actual operation, so please consider it as reference only.