Comparing Directory Structure Patterns for Snowflake × Terraform
This page has been translated by machine translation. View original
Introduction
I'm kasama from the Data Business Division. In this article, I'd like to summarize the key points I considered when managing development/production Snowflake environments with Terraform, particularly focusing on directory structure.
Premise
As a premise, this discussion is about the open-source version of Terraform, not considering HCP Terraform (cloud version).
I will reference the structure from the following blogs, with some customization:
I'll examine the following directory structure. This implementation creates modules by access role + resource unit and calls them from main.tf for each dev/prd environment. Terraform init, plan, and apply commands are executed via GitHub Actions.
snowflake-terraform-sample % tree
.
├── .github/
│ └── workflows/
│ ├── dev-snowflake-terraform-cicd.yml
│ └── prd-snowflake-terraform-cicd.yml
├── cfn
│ └── create_state_gha_resources.yml
├── environments
│ ├── dev
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ └── prd
│ ├── backend.tf
│ ├── main.tf
│ ├── outputs.tf
│ ├── variables.tf
│ └── versions.tf
├── modules
│ ├── access_role_and_database
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── access_role_and_file_format
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── access_role_and_schema
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── access_role_and_stage
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── access_role_and_storage_integration
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── access_role_and_warehouse
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── aws_storage_integration
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── functional_role
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
└── README.md
Whether to separate directories for dev/prd
For multiple environments like dev/prd, it's also possible to implement without separating directory structures.
Although I haven't verified this in practice, I'm describing what I believe could work: creating backend.tf and tfvars files for each environment and specifying arguments when executing Terraform commands in CI/CD settings. This approach could follow DRY principles without duplicating main.tf.
snowflake-terraform/
├── .github/
│ └── workflows/
│ ├── dev-snowflake-terraform-cicd.yml
│ └── prd-snowflake-terraform-cicd.yml
├── cfn
│ └── create_state_gha_resources.yml
├── terraform/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── versions.tf
│ ├── backends/ # Environment-specific backend config
│ │ ├── dev-backend.tf
│ │ └── prd-backend.tf
│ ├── environments/ # Environment-specific variable values
│ │ ├── dev.tfvars
│ │ └── prd.tfvars
├── modules
│ ├── :
Comparison
Here's a comparison of both approaches:
| Comparison Item | Environment-specific Directory | Variable Separation (DRY Principle) |
|---|---|---|
| CI/CD Configuration | ⭕️ Specify directory by environment Simple configuration |
⚠️ Requires -backend-config for terraform init and -var-file for terraform plan/apply |
| Environment Differences | ⭕️ Create main.tf for each directory and call modules | ⭕️ Absorb differences with environment-specific tfvars files |
| Maintainability | ⚠️ Low Changes need to be reflected across all environments |
⭕️ High Complete with a single change |
| Risk of Operation Errors | ⭕️ Very low Physically separated |
⚠️ Low Controlled by CI/CD execution |
| Recommended Cases | When environment differences are significant Generally recommended structure |
When environment differences are minimal When DRY principles are prioritized |
While Hashicorp and Google Cloud best practices recommend environment-specific directory structures, it's important to select an appropriate structure based on your project's situation. Environment-specific directory structure is particularly effective when environment differences are significant or when operational simplicity is a priority. For Snowflake, where you might have different warehouse sizes or network policies between dev/prd environments, the environment-specific configuration is simpler. If you find minimal environment differences and significant drawbacks from implementation duplication during operation, you could consider switching to variable separation structure.
I referenced the following articles:
Module Structure: Resource Type vs. User Type
While the current structure divides modules by resource type (database, file_format, schema, etc.), another option would be to group resources by user type:
snowflake-terraform/
├── .github/
│ └── workflows/
│ ├── dev-snowflake-terraform-cicd.yml
│ └── prd-snowflake-terraform-cicd.yml
├── cfn/
│ └── create_state_gha_resources.yml
├── environments/
│ ├── dev/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ └── prd/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
├── modules/
│ ├── admin/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── versions.tf
│ ├── analyst/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── versions.tf
│ ├── data_engineer/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── versions.tf
│ ├── aws_storage_integration/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── versions.tf
│ └── network_policy/
│ ├── main.tf
│ ├── variables.tf
│ └── versions.tf
This implementation defines resources in the main.tf for each user type and calls them from environments/<environment>/main.tf.
resource "snowflake_database" "main" {
:
}
resource "snowflake_schema" "main" {
:
}
resource "snowflake_warehouse" "main" {
:
}
resource "snowflake_database" "main" {
:
}
resource "snowflake_schema" "main" {
:
}
resource "snowflake_warehouse" "main" {
:
}
module "sample_admin" {
source = "../../modules/admin"
:
}
module "sample_analyst" {
source = "../../modules/analyst"
:
}
Comparison
Here's a comparison of both approaches:
| Comparison Item | By Resource Type | By User Type |
|---|---|---|
| Code Duplication | ⭕️ Implementation of each resource is in one place Adheres to DRY principles |
⚠️ Same resource definitions scattered across multiple modules (e.g., warehouse duplicated in each role) |
| Adding New Resources | ⚠️ Multiple module calls needed (database + schema + warehouse, etc.) |
⭕️ Complete with one module call All resources needed for a role created at once |
| Modifying Existing Resources | ⭕️ One change reflects across the board (e.g., change warehouse settings all at once) |
⚠️ Same change needed across multiple modules (e.g., modify warehouse settings for each role separately) |
| Managing Dependencies | ⚠️ Increased outputs/inputs wiring between modules (e.g., passing database.id to schema module) |
⭕️ More self-contained within roles |
| Learning Curve | ⚠️ Requires understanding of wiring between modules | ⭕️ Intuitively understood by "people's roles" |
| Recommended Cases | When prioritizing resource reusability Generally recommended structure |
When role-based resource patterns are fixed When prioritizing ease of adding new resources |
Generally, managing By Resource Type is more in line with DRY principles and reduces code duplication. However, managing By User Type should be considered if your project's situation involves fixed resource patterns by role. The By User Type approach could be suitable when resource patterns are fixed by role and you need to easily add many instances of the same user type (e.g., admin_1, admin_2...admin_50).
Hybrid Structure
To leverage the advantages of both approaches, you could implement a hybrid structure where modules By User Type are called from environments/<environment>/main.tf, and within these user type modules, modules By Resource Type are called:
snowflake-terraform-sample/
├── .github/
│ └── workflows/
│ ├── dev-snowflake-terraform-cicd.yml
│ └── prd-snowflake-terraform-cicd.yml
├── cfn/
│ └── create_state_gha_resources.yml
├── environments/
│ ├── dev/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ └── prd/
│ ├── backend.tf
│ ├── main.tf
│ ├── outputs.tf
│ ├── variables.tf
│ └── versions.tf
├── modules/
│ ├── base_resources/ # Basic resource modules (DRY principle)
│ │ ├── access_role_and_database/
│ │ ├── access_role_and_file_format/
│ │ ├── access_role_and_schema/
│ │ ├── functional_role/
│ │ └── ... (other resources)
│ └── user_types/ # Modules by user type
│ ├── admin/
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── data_engineer/
│ └── analyst/
└── README.md
# Resources for Admin
module "project_a_admin" {
source = "../../modules/user_types/admin"
project_name = "PROJECT_A"
environment = "DEV"
warehouse_size = "XSMALL"
data_retention_time_in_days = 1
providers = {
snowflake.sysadmin = snowflake.sysadmin
snowflake.securityadmin = snowflake.securityadmin
}
}
# Easy to add multiple Admin users
module "project_b_admin" {
source = "../../modules/user_types/admin"
project_name = "PROJECT_B"
environment = "DEV"
warehouse_size = "XSMALL"
data_retention_time_in_days = 1
providers = {
snowflake.sysadmin = snowflake.sysadmin
snowflake.securityadmin = snowflake.securityadmin
}
}
locals {
prefix = upper("${var.project_name}_${var.environment}_ADMIN")
}
# Create Functional Role
module "functional_role" {
source = "../../base_resources/functional_role"
role_name = "${local.prefix}_ROLE"
comment = "Admin functional role for ${var.project_name}"
}
# Create Database
module "database" {
source = "../../base_resources/access_role_and_database"
database_name = "${local.prefix}_DB"
comment = "Admin database for ${var.project_name}"
data_retention_time_in_days = var.data_retention_time_in_days
access_role_name = "${local.prefix}_DB_ACCESS_ROLE"
access_role_comment = "Access role for admin database"
usage_grants = [module.functional_role.role_name]
create_schema_grants = [module.functional_role.role_name]
monitor_grants = [module.functional_role.role_name]
}
# Create Schema
module "schema_public" {
source = "../../base_resources/access_role_and_schema"
database_name = module.database.database_name
schema_name = "PUBLIC"
comment = "Public schema for admin"
access_role_name = "${local.prefix}_PUBLIC_ACCESS_ROLE"
access_role_comment = "Access role for public schema"
usage_grants = [module.functional_role.role_name]
create_table_grants = [module.functional_role.role_name]
}
# Create File Format
module "file_format_csv" {
source = "../../base_resources/access_role_and_file_format"
database_name = module.database.database_name
schema_name = module.schema_public.schema_name
file_format_name = "${local.prefix}_CSV_FORMAT"
format_type = "CSV"
comment = "CSV file format for admin"
access_role_name = "${local.prefix}_CSV_FORMAT_ACCESS_ROLE"
access_role_comment = "Access role for CSV file format"
usage_grants = [module.functional_role.role_name]
}
Although the module structure becomes slightly more complex, it achieves both the advantage of easily adding many instances of the same user type and the advantage of reflecting changes across the board with a single modification.
Directory Structure When Resource Count Increases
So far, our implementation has called modules from a single environments/<environment>/main.tf file. While this structure works well with few resources, it becomes challenging to manage when the count grows to 100 or 200. Here are some solutions:
File Separation by Resource
snowflake-terraform-sample % tree
.
├── .github/
│ └── workflows/
│ ├── dev-snowflake-terraform-cicd.yml
│ └── prd-snowflake-terraform-cicd.yml
├── cfn
│ └── create_state_gha_resources.yml
├── environments
│ ├── dev
│ │ ├── backend.tf
│ │ ├── main.tf → Common processing
│ │ ├── databases.tf → Resource-specific
│ │ ├── file_formats.tf → Resource-specific
│ │ ├── schemas.tf → Resource-specific
│ │ ├── ... (other resources)
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ └── prd
│ ├── backend.tf
│ ├── main.tf → Common processing
│ ├── databases.tf → Resource-specific
│ ├── file_formats.tf → Resource-specific
│ ├── schemas.tf → Resource-specific
│ ├── ... (other resources)
│ ├── outputs.tf
│ ├── variables.tf
│ └── versions.tf
├── modules/
│ ├── access_role_and_database/
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── :
This pattern creates tf files for each resource like databases.tf, file_formats.tf. If the code volume increases within databases, you could split into databases_a.tf, databases_b.tf. However, excessive file splitting can lead to too many files in the environment folder, making it difficult to manage.
Subdirectory Structure Using YAML Files
snowflake-terraform-sample/
├── .github/
│ └── workflows/
│ ├── dev-snowflake-terraform-cicd.yml
│ └── prd-snowflake-terraform-cicd.yml
├── cfn/
│ └── create_state_gha_resources.yml
├── environments/
│ ├── dev/
│ │ ├── backend.tf
│ │ ├── main.tf # Loads YAML and calls modules
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ ├── versions.tf
│ │ └── yml/ # YAML definitions
│ │ ├── databases/
│ │ │ ├── database_1.yml
│ │ │ ├── database_2.yml
│ │ ├── file_formats/
│ │ │ ├── file_format_1.yml
│ │ │ ├── file_format_2.yml
│ │ ├── schemas/
│ │ │ ├── schema_1.yml
│ │ │ ├── schema_2.yml
│ │ └── ... (other resources)
│ └── prd/
│ ├── backend.tf
│ ├── main.tf # Loads YAML and calls modules
│ ├── outputs.tf
│ ├── variables.tf
│ ├── versions.tf
│ └── yml/ # Resource definitions
│ ├── databases/
│ │ ├── database_1.yml
│ │ ├── database_2.yml
│ ├── file_formats/
│ │ ├── file_format_1.yml
│ │ ├── file_format_2.yml
│ ├── schemas/
│ │ ├── schema_1.yml
│ │ ├── schema_2.yml
│ └── ... (other resources)
├── modules/
│ ├── access_role_and_database/
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── :
This implementation defines parameters in YAML files and references them in environments/<environment>/main.tf.
name: DATABASE_1_DEV
comment: "Development database 1"
data_retention_time_in_days: 1
access_role:
name: DATABASE_1_DEV_ACCESS_ROLE
comment: "Access role for database 1"
grants:
usage:
- ADMIN_ROLE_1
- SYSADMIN
create_schema:
- ADMIN_ROLE_1
monitor:
- ADMIN_ROLE_1
# ============================================================
# Loading YAML files
# ============================================================
locals {
# Decode each YAML file
database_configs = {
for f in fileset("${path.module}/yml/databases", "*.yml") :
trimsuffix(f, ".yml") => yamldecode(file("${path.module}/yml/databases/${f}"))
}
file_format_configs = {
for f in fileset("${path.module}/yml/file_formats", "*.yml") :
trimsuffix(f, ".yml") => yamldecode(file("${path.module}/yml/file_formats/${f}"))
}
schema_configs = {
for f in fileset("${path.module}/yml/schemas", "*.yml") :
trimsuffix(f, ".yml") => yamldecode(file("${path.module}/yml/schemas/${f}"))
}
}
# ============================================================
# Database module calls
# ============================================================
module "database_1" {
source = "../../modules/access_role_and_database"
database_name = local.database_configs["database_1"].name
comment = local.database_configs["database_1"].comment
data_retention_time_in_days = local.database_configs["database_1"].data_retention_time_in_days
access_role_name = local.database_configs["database_1"].access_role.name
access_role_comment = local.database_configs["database_1"].access_role.comment
usage_grants = local.database_configs["database_1"].grants.usage
create_schema_grants = local.database_configs["database_1"].grants.create_schema
monitor_grants = local.database_configs["database_1"].grants.monitor
providers = {
snowflake.sysadmin = snowflake.sysadmin
snowflake.securityadmin = snowflake.securityadmin
}
}
module "database_2" {
source = "../../modules/access_role_and_database"
database_name = local.database_configs["database_2"].name
comment = local.database_configs["database_2"].comment
data_retention_time_in_days = local.database_configs["database_2"].data_retention_time_in_days
access_role_name = local.database_configs["database_2"].access_role.name
access_role_comment = local.database_configs["database_2"].access_role.comment
usage_grants = local.database_configs["database_2"].grants.usage
create_schema_grants = local.database_configs["database_2"].grants.create_schema
monitor_grants = try(local.database_configs["database_2"].grants.monitor, [])
providers = {
snowflake.sysadmin = snowflake.sysadmin
snowflake.securityadmin = snowflake.securityadmin
}
}
Using YAML allows you to reduce the code volume in main.tf while maintaining a subdirectory structure. However, syntax errors or type mismatches in YAML can't be detected until terraform plan execution, and terraform validate can't validate values within YAML. Also, dependencies in YAML are described as strings, making it harder to understand resource dependencies compared to explicit references in HCL (like module.database_1.name). These factors may complicate debugging and maintenance.
I referenced the following article for YAML-based structure:
State Separation Structure
snowflake-terraform-sample/
├── .github/
│ └── workflows/
│ ├── dev-snowflake-terraform-cicd.yml
│ └── prd-snowflake-terraform-cicd.yml
├── cfn/
│ └── create_state_gha_resources.yml
├── environments/
│ ├── dev/
│ ├── common/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── databases/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── file_formats/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ └── schemas/
│ ├── backend.tf
│ ├── main.tf
│ ├── outputs.tf
│ ├── variables.tf
│ └── versions.tf
│ └── prd/
│ ├── common/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── databases/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── file_formats/
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ └── schemas/
│ ├── backend.tf
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
├── modules/
│ ├── access_role_and_database/
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── variables.tf
│ │ └── versions.tf
│ ├── :
When separating states, resources with dependencies like database and schema require reference through terraform_remote_state.
output "database_1_name" {
value = module.database_1.database_name
}
# Load other state files
data "terraform_remote_state" "databases" {
backend = "s3"
config = {
bucket = "terraform-state-bucket"
key = "snowflake/dev/databases/terraform.tfstate" # ← Different state file
region = "ap-northeast-1"
}
}
module "schema_1" {
source = "../../modules/access_role_and_schema"
# Get value via remote_state
database_name = data.terraform_remote_state.databases.outputs.database_1_name
schema_name = "PUBLIC"
}
While State separation allows subdirectory structure management with tf files only, managing state per resource complicates dependency management. You also need to consider deployment order (common → databases → schemas), which complicates CI/CD pipeline configuration.
Comparison
I compared each approach.
| Comparison Item | File Separation Structure | YAML Utilization Structure | State Separation Structure |
|---|---|---|---|
| Directory Structure | ⭕️ Flat Placing .tf files directly under environment |
⭕️ Hierarchical Organizing YAML in subdirectories |
⭕️ Hierarchical Directories by resource type |
| Code Volume per File | ⚠️ High Each resource defined in 1 tf file |
⭕️ Low Defined in YAML, main.tf is concise |
⭕️ Low Output and data references added |
| State Management | ⭕️ Single State | ⭕️ Single State | ⚠️ Multiple States (separated) |
| Dependency Management | ⭕️ Simple Referenced via module.xxx within the same state |
⚠️ Implicit String references within YAML |
⚠️ Complexterraform_remote_state required |
| CI/CD Configuration | ⭕️ Simple Completed in one execution |
⭕️ Simple Completed in one execution |
⚠️ Complex Execution order control needed |
| Performance | ⭕️ Standard | ⭕️ Standard | ⭕️ High Can execute only changed parts |
| Learning Curve | ⭕️ Low Standard Terraform structure |
⚠️ Medium Understanding custom YAML structure required |
⚠️ High Understanding state references mandatory |
| Recommended Cases | Single state management with few resources | Single state management with subdirectory structure desired | Subdirectory structure desired with tf files only |
My personal view is that if you don't expect to have a large number of resources, starting with a single state file separation structure is best. This configuration is the simplest and follows Terraform's standard usage, keeping the learning curve low for team members. In the initial stages, file division by resource type such as databases.tf and schemas.tf is sufficiently manageable.
If you know from the beginning that the number of resources will increase, choosing a state separation structure would be better. While state separation makes dependency management more complex, it offers benefits like performance improvement and impact scope limitation in large-scale environments. If adopting state separation structure later, you'll need to execute the terraform mv command for each module.
For the YAML utilization structure, I believe the disadvantages of debugging complexity outweigh the benefits.
In conclusion, at this stage I think the approach of "starting simple and gradually increasing complexity as needed" is best.
Terragrunt is also an option. This tool specializes in multiple state management and is a valid choice if large-scale resource management (e.g., 100+ resources) is expected from the beginning or if the team has learning resources available.
Finally
The content described above is just my thinking at the time of writing, and I believe it may change after years of actual operation, so please consider it as reference.
