Comparing Directory Structure Patterns for Snowflake × Terraform

Comparing Directory Structure Patterns for Snowflake × Terraform

2025.09.07

This page has been translated by machine translation. View original

Introduction

I'm kasama from the Data Business Division. In this article, I'd like to summarize the key points I considered when managing development/production Snowflake environments with Terraform, particularly focusing on directory structure.

Premise

As a premise, this discussion is about the open-source version of Terraform, not considering HCP Terraform (cloud version).
I will reference the structure from the following blogs, with some customization:
https://dev.classmethod.jp/articles/snowflake-terraform-design-with-functional-and-access-role/
https://dev.classmethod.jp/articles/snowflake-terraform-how-to-terraform-with-github-actions/

I'll examine the following directory structure. This implementation creates modules by access role + resource unit and calls them from main.tf for each dev/prd environment. Terraform init, plan, and apply commands are executed via GitHub Actions.

snowflake-terraform-sample % tree
.
├── .github/
│   └── workflows/
│       ├── dev-snowflake-terraform-cicd.yml
│       └── prd-snowflake-terraform-cicd.yml
├── cfn
│   └── create_state_gha_resources.yml
├── environments
│   ├── dev
│   │   ├── backend.tf
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   └── versions.tf
│   └── prd
│       ├── backend.tf
│       ├── main.tf
│       ├── outputs.tf
│       ├── variables.tf
│       └── versions.tf
├── modules
│   ├── access_role_and_database
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   └── versions.tf
│   ├── access_role_and_file_format
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   └── versions.tf
│   ├── access_role_and_schema
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   └── versions.tf
│   ├── access_role_and_stage
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   └── versions.tf
│   ├── access_role_and_storage_integration
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   └── versions.tf
│   ├── access_role_and_warehouse
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   └── versions.tf
│   ├── aws_storage_integration
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   └── versions.tf
│   ├── functional_role
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   └── versions.tf
└── README.md

Whether to separate directories for dev/prd

For multiple environments like dev/prd, it's also possible to implement without separating directory structures.
Although I haven't verified this in practice, I'm describing what I believe could work: creating backend.tf and tfvars files for each environment and specifying arguments when executing Terraform commands in CI/CD settings. This approach could follow DRY principles without duplicating main.tf.

snowflake-terraform/
├── .github/
│   └── workflows/
│       ├── dev-snowflake-terraform-cicd.yml
│       └── prd-snowflake-terraform-cicd.yml
├── cfn
│   └── create_state_gha_resources.yml
├── terraform/
│   ├── main.tf
│   ├── variables.tf
│   ├── outputs.tf
│   ├── versions.tf
│   ├── backends/               # Environment-specific backend config
│   │   ├── dev-backend.tf
│   │   └── prd-backend.tf
│   ├── environments/           # Environment-specific variable values
│   │   ├── dev.tfvars
│   │   └── prd.tfvars
├── modules
│   ├── :

Comparison

Here's a comparison of both approaches:

Comparison Item Environment-specific Directory Variable Separation (DRY Principle)
CI/CD Configuration ⭕️ Specify directory by environment
Simple configuration
⚠️ Requires -backend-config for terraform init and -var-file for terraform plan/apply
Environment Differences ⭕️ Create main.tf for each directory and call modules ⭕️ Absorb differences with environment-specific tfvars files
Maintainability ⚠️ Low
Changes need to be reflected across all environments
⭕️ High
Complete with a single change
Risk of Operation Errors ⭕️ Very low
Physically separated
⚠️ Low
Controlled by CI/CD execution
Recommended Cases When environment differences are significant
Generally recommended structure
When environment differences are minimal
When DRY principles are prioritized

While Hashicorp and Google Cloud best practices recommend environment-specific directory structures, it's important to select an appropriate structure based on your project's situation. Environment-specific directory structure is particularly effective when environment differences are significant or when operational simplicity is a priority. For Snowflake, where you might have different warehouse sizes or network policies between dev/prd environments, the environment-specific configuration is simpler. If you find minimal environment differences and significant drawbacks from implementation duplication during operation, you could consider switching to variable separation structure.

I referenced the following articles:
https://developer.hashicorp.com/terraform/language/style#module-structure
https://cloud.google.com/docs/terraform/best-practices/root-modules?hl=ja
https://zenn.dev/sway/articles/terraform_style_envcomparisontable
https://medium.com/snowflake/so-you-want-to-terraform-snowflake-a6d16ca3237e

Module Structure: Resource Type vs. User Type

While the current structure divides modules by resource type (database, file_format, schema, etc.), another option would be to group resources by user type:

snowflake-terraform/
├── .github/
│   └── workflows/
│       ├── dev-snowflake-terraform-cicd.yml
│       └── prd-snowflake-terraform-cicd.yml
├── cfn/
│   └── create_state_gha_resources.yml
├── environments/
│   ├── dev/
│   │   ├── backend.tf
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   └── versions.tf
│   └── prd/
│   │   ├── backend.tf
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   └── versions.tf
├── modules/
│   ├── admin/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   │   └── versions.tf
│   ├── analyst/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   │   └── versions.tf
│   ├── data_engineer/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   │   └── versions.tf
│   ├── aws_storage_integration/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   │   └── versions.tf
│   └── network_policy/
│       ├── main.tf
│       ├── variables.tf
│       └── versions.tf

This implementation defines resources in the main.tf for each user type and calls them from environments/<environment>/main.tf.

modules/admin/main.tf
resource "snowflake_database" "main" {
 :
}
resource "snowflake_schema" "main" {
 :
}
resource "snowflake_warehouse" "main" {
 :
}
modules/analyst/main.tf
resource "snowflake_database" "main" {
 :
}
resource "snowflake_schema" "main" {
 :
}
resource "snowflake_warehouse" "main" {
 :
}
environments/<environment>/main.tf
module "sample_admin" {
  source = "../../modules/admin"
	:
}	
module "sample_analyst" {
  source = "../../modules/analyst"
	:
}	

Comparison

Here's a comparison of both approaches:

Comparison Item By Resource Type By User Type
Code Duplication ⭕️ Implementation of each resource is in one place
Adheres to DRY principles
⚠️ Same resource definitions scattered across multiple modules
(e.g., warehouse duplicated in each role)
Adding New Resources ⚠️ Multiple module calls needed
(database + schema + warehouse, etc.)
⭕️ Complete with one module call
All resources needed for a role created at once
Modifying Existing Resources ⭕️ One change reflects across the board
(e.g., change warehouse settings all at once)
⚠️ Same change needed across multiple modules
(e.g., modify warehouse settings for each role separately)
Managing Dependencies ⚠️ Increased outputs/inputs wiring between modules
(e.g., passing database.id to schema module)
⭕️ More self-contained within roles
Learning Curve ⚠️ Requires understanding of wiring between modules ⭕️ Intuitively understood by "people's roles"
Recommended Cases When prioritizing resource reusability
Generally recommended structure
When role-based resource patterns are fixed
When prioritizing ease of adding new resources

Generally, managing By Resource Type is more in line with DRY principles and reduces code duplication. However, managing By User Type should be considered if your project's situation involves fixed resource patterns by role. The By User Type approach could be suitable when resource patterns are fixed by role and you need to easily add many instances of the same user type (e.g., admin_1, admin_2...admin_50).

Hybrid Structure

To leverage the advantages of both approaches, you could implement a hybrid structure where modules By User Type are called from environments/<environment>/main.tf, and within these user type modules, modules By Resource Type are called:

snowflake-terraform-sample/
├── .github/
│   └── workflows/
│       ├── dev-snowflake-terraform-cicd.yml
│       └── prd-snowflake-terraform-cicd.yml
├── cfn/
│   └── create_state_gha_resources.yml
├── environments/
│   ├── dev/
│   │   ├── backend.tf
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   └── versions.tf
│   └── prd/
│       ├── backend.tf
│       ├── main.tf
│       ├── outputs.tf
│       ├── variables.tf
│       └── versions.tf
├── modules/
│   ├── base_resources/  # Basic resource modules (DRY principle)
│   │   ├── access_role_and_database/
│   │   ├── access_role_and_file_format/
│   │   ├── access_role_and_schema/
│   │   ├── functional_role/
│   │   └── ... (other resources)
│   └── user_types/  # Modules by user type
│       ├── admin/
│       │   ├── main.tf
│       │   ├── outputs.tf
│       │   ├── variables.tf
│       │   └── versions.tf
│       ├── data_engineer/
│       └── analyst/
└── README.md
environments/dev/main.tf
# Resources for Admin
module "project_a_admin" {
  source = "../../modules/user_types/admin"

  project_name                = "PROJECT_A"
  environment                 = "DEV"
  warehouse_size              = "XSMALL"
  data_retention_time_in_days = 1

  providers = {
    snowflake.sysadmin      = snowflake.sysadmin
    snowflake.securityadmin = snowflake.securityadmin
  }
}

# Easy to add multiple Admin users
module "project_b_admin" {
  source = "../../modules/user_types/admin"

  project_name                = "PROJECT_B"
  environment                 = "DEV"
  warehouse_size              = "XSMALL"
  data_retention_time_in_days = 1

  providers = {
    snowflake.sysadmin      = snowflake.sysadmin
    snowflake.securityadmin = snowflake.securityadmin
  }
}
modules/user_types/admin/main.tf
locals {
  prefix = upper("${var.project_name}_${var.environment}_ADMIN")
}

# Create Functional Role
module "functional_role" {
  source = "../../base_resources/functional_role"

  role_name = "${local.prefix}_ROLE"
  comment   = "Admin functional role for ${var.project_name}"
}

# Create Database
module "database" {
  source = "../../base_resources/access_role_and_database"

  database_name                = "${local.prefix}_DB"
  comment                     = "Admin database for ${var.project_name}"
  data_retention_time_in_days = var.data_retention_time_in_days

  access_role_name    = "${local.prefix}_DB_ACCESS_ROLE"
  access_role_comment = "Access role for admin database"

  usage_grants         = [module.functional_role.role_name]
  create_schema_grants = [module.functional_role.role_name]
  monitor_grants       = [module.functional_role.role_name]
}

# Create Schema
module "schema_public" {
  source = "../../base_resources/access_role_and_schema"

  database_name = module.database.database_name
  schema_name   = "PUBLIC"
  comment       = "Public schema for admin"

  access_role_name    = "${local.prefix}_PUBLIC_ACCESS_ROLE"
  access_role_comment = "Access role for public schema"

  usage_grants        = [module.functional_role.role_name]
  create_table_grants = [module.functional_role.role_name]
}

# Create File Format
module "file_format_csv" {
  source = "../../base_resources/access_role_and_file_format"

  database_name     = module.database.database_name
  schema_name       = module.schema_public.schema_name
  file_format_name  = "${local.prefix}_CSV_FORMAT"
  format_type       = "CSV"
  comment          = "CSV file format for admin"

  access_role_name    = "${local.prefix}_CSV_FORMAT_ACCESS_ROLE"
  access_role_comment = "Access role for CSV file format"

  usage_grants = [module.functional_role.role_name]
}

Although the module structure becomes slightly more complex, it achieves both the advantage of easily adding many instances of the same user type and the advantage of reflecting changes across the board with a single modification.

Directory Structure When Resource Count Increases

So far, our implementation has called modules from a single environments/<environment>/main.tf file. While this structure works well with few resources, it becomes challenging to manage when the count grows to 100 or 200. Here are some solutions:

File Separation by Resource

snowflake-terraform-sample % tree
.
├── .github/
│   └── workflows/
│       ├── dev-snowflake-terraform-cicd.yml
│       └── prd-snowflake-terraform-cicd.yml
├── cfn
│   └── create_state_gha_resources.yml
├── environments
│   ├── dev
│   │   ├── backend.tf
│   │   ├── main.tf → Common processing
│   │   ├── databases.tf → Resource-specific
│   │   ├── file_formats.tf → Resource-specific
│   │   ├── schemas.tf → Resource-specific
│   │   ├── ... (other resources)
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   └── versions.tf
│   └── prd
│       ├── backend.tf
│       ├── main.tf → Common processing
│       ├── databases.tf → Resource-specific
│       ├── file_formats.tf → Resource-specific
│       ├── schemas.tf → Resource-specific
│       ├── ... (other resources)
│       ├── outputs.tf
│       ├── variables.tf
│       └── versions.tf
├── modules/
│   ├── access_role_and_database/
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   └── versions.tf
│   ├── :

This pattern creates tf files for each resource like databases.tf, file_formats.tf. If the code volume increases within databases, you could split into databases_a.tf, databases_b.tf. However, excessive file splitting can lead to too many files in the environment folder, making it difficult to manage.

Subdirectory Structure Using YAML Files

snowflake-terraform-sample/
├── .github/
│   └── workflows/
│       ├── dev-snowflake-terraform-cicd.yml
│       └── prd-snowflake-terraform-cicd.yml
├── cfn/
│   └── create_state_gha_resources.yml
├── environments/
│   ├── dev/
│   │   ├── backend.tf
│   │   ├── main.tf              # Loads YAML and calls modules
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   ├── versions.tf
│   │   └── yml/           # YAML definitions
│   │       ├── databases/
│   │       │   ├── database_1.yml
│   │       │   ├── database_2.yml
│   │       ├── file_formats/
│   │       │   ├── file_format_1.yml
│   │       │   ├── file_format_2.yml
│   │       ├── schemas/
│   │       │   ├── schema_1.yml
│   │       │   ├── schema_2.yml
│   │       └── ... (other resources)
│   └── prd/
│       ├── backend.tf
│       ├── main.tf              # Loads YAML and calls modules
│       ├── outputs.tf
│       ├── variables.tf
│       ├── versions.tf
│       └── yml/           # Resource definitions
│           ├── databases/
│           │   ├── database_1.yml
│           │   ├── database_2.yml
│           ├── file_formats/
│           │   ├── file_format_1.yml
│           │   ├── file_format_2.yml
│           ├── schemas/
│           │   ├── schema_1.yml
│           │   ├── schema_2.yml
│           └── ... (other resources)
├── modules/
│   ├── access_role_and_database/
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   └── versions.tf
│   ├── :

This implementation defines parameters in YAML files and references them in environments/<environment>/main.tf.

environments/dev/yml/databases/database_1.yml
name: DATABASE_1_DEV
comment: "Development database 1"
data_retention_time_in_days: 1

access_role:
  name: DATABASE_1_DEV_ACCESS_ROLE
  comment: "Access role for database 1"

grants:
  usage:
    - ADMIN_ROLE_1
    - SYSADMIN
  create_schema:
    - ADMIN_ROLE_1
  monitor:
    - ADMIN_ROLE_1
environments/dev/main.tf
# ============================================================
# Loading YAML files
# ============================================================

locals {
  # Decode each YAML file
  database_configs = {
    for f in fileset("${path.module}/yml/databases", "*.yml") :
    trimsuffix(f, ".yml") => yamldecode(file("${path.module}/yml/databases/${f}"))
  }

  file_format_configs = {
    for f in fileset("${path.module}/yml/file_formats", "*.yml") :
    trimsuffix(f, ".yml") => yamldecode(file("${path.module}/yml/file_formats/${f}"))
  }

  schema_configs = {
    for f in fileset("${path.module}/yml/schemas", "*.yml") :
    trimsuffix(f, ".yml") => yamldecode(file("${path.module}/yml/schemas/${f}"))
  }
}

# ============================================================
# Database module calls
# ============================================================

module "database_1" {
  source = "../../modules/access_role_and_database"

  database_name               = local.database_configs["database_1"].name
  comment                    = local.database_configs["database_1"].comment
  data_retention_time_in_days = local.database_configs["database_1"].data_retention_time_in_days

  access_role_name    = local.database_configs["database_1"].access_role.name
  access_role_comment = local.database_configs["database_1"].access_role.comment

  usage_grants         = local.database_configs["database_1"].grants.usage
  create_schema_grants = local.database_configs["database_1"].grants.create_schema
  monitor_grants       = local.database_configs["database_1"].grants.monitor

  providers = {
    snowflake.sysadmin      = snowflake.sysadmin
    snowflake.securityadmin = snowflake.securityadmin
  }
}

module "database_2" {
  source = "../../modules/access_role_and_database"

  database_name               = local.database_configs["database_2"].name
  comment                    = local.database_configs["database_2"].comment
  data_retention_time_in_days = local.database_configs["database_2"].data_retention_time_in_days

  access_role_name    = local.database_configs["database_2"].access_role.name
  access_role_comment = local.database_configs["database_2"].access_role.comment

  usage_grants         = local.database_configs["database_2"].grants.usage
  create_schema_grants = local.database_configs["database_2"].grants.create_schema
  monitor_grants       = try(local.database_configs["database_2"].grants.monitor, [])

  providers = {
    snowflake.sysadmin      = snowflake.sysadmin
    snowflake.securityadmin = snowflake.securityadmin
  }
}

Using YAML allows you to reduce the code volume in main.tf while maintaining a subdirectory structure. However, syntax errors or type mismatches in YAML can't be detected until terraform plan execution, and terraform validate can't validate values within YAML. Also, dependencies in YAML are described as strings, making it harder to understand resource dependencies compared to explicit references in HCL (like module.database_1.name). These factors may complicate debugging and maintenance.

I referenced the following article for YAML-based structure:
https://datumstudio.jp/blog/0131_terraform_snowflake_role_creating/

State Separation Structure

snowflake-terraform-sample/
├── .github/
│   └── workflows/
│       ├── dev-snowflake-terraform-cicd.yml
│       └── prd-snowflake-terraform-cicd.yml
├── cfn/
│   └── create_state_gha_resources.yml
├── environments/
│   ├── dev/
│       ├── common/
│       │   ├── backend.tf
│       │   ├── main.tf
│       │   ├── outputs.tf
│       │   ├── variables.tf
│       │   └── versions.tf
│       ├── databases/
│       │   ├── backend.tf
│       │   ├── main.tf
│       │   ├── outputs.tf
│       │   ├── variables.tf
│       │   └── versions.tf
│       ├── file_formats/
│       │   ├── backend.tf
│       │   ├── main.tf
│       │   ├── outputs.tf
│       │   ├── variables.tf
│       │   └── versions.tf
│       └── schemas/
│           ├── backend.tf
│           ├── main.tf
│           ├── outputs.tf
│           ├── variables.tf
│           └── versions.tf
│   └── prd/
│       ├── common/
│       │   ├── backend.tf
│       │   ├── main.tf
│       │   ├── outputs.tf
│       │   ├── variables.tf
│       │   └── versions.tf
│       ├── databases/
│       │   ├── backend.tf
│       │   ├── main.tf
│       │   ├── outputs.tf
│       │   ├── variables.tf
│       │   └── versions.tf
│       ├── file_formats/
│       │   ├── backend.tf
│       │   ├── main.tf
│       │   ├── outputs.tf
│       │   ├── variables.tf
│       │   └── versions.tf
│       └── schemas/
│           ├── backend.tf
│           ├── main.tf
│           ├── outputs.tf
│           └── variables.tf
├── modules/
│   ├── access_role_and_database/
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   └── versions.tf
│   ├── :

When separating states, resources with dependencies like database and schema require reference through terraform_remote_state.

environments/dev/databases/outputs.tf
output "database_1_name" {
  value = module.database_1.database_name
}
environments/dev/schemas/main.tf
# Load other state files
data "terraform_remote_state" "databases" {
  backend = "s3"
  config = {
    bucket = "terraform-state-bucket"
    key    = "snowflake/dev/databases/terraform.tfstate"  # ← Different state file
    region = "ap-northeast-1"
  }
}

module "schema_1" {
  source = "../../modules/access_role_and_schema"
  # Get value via remote_state
  database_name = data.terraform_remote_state.databases.outputs.database_1_name
  schema_name = "PUBLIC"
}

While State separation allows subdirectory structure management with tf files only, managing state per resource complicates dependency management. You also need to consider deployment order (common → databases → schemas), which complicates CI/CD pipeline configuration.

Comparison

I compared each approach.

Comparison Item File Separation Structure YAML Utilization Structure State Separation Structure
Directory Structure ⭕️ Flat
Placing .tf files directly under environment
⭕️ Hierarchical
Organizing YAML in subdirectories
⭕️ Hierarchical
Directories by resource type
Code Volume per File ⚠️ High
Each resource defined in 1 tf file
⭕️ Low
Defined in YAML, main.tf is concise
⭕️ Low
Output and data references added
State Management ⭕️ Single State ⭕️ Single State ⚠️ Multiple States (separated)
Dependency Management ⭕️ Simple
Referenced via module.xxx within the same state
⚠️ Implicit
String references within YAML
⚠️ Complex
terraform_remote_state required
CI/CD Configuration ⭕️ Simple
Completed in one execution
⭕️ Simple
Completed in one execution
⚠️ Complex
Execution order control needed
Performance ⭕️ Standard ⭕️ Standard ⭕️ High
Can execute only changed parts
Learning Curve ⭕️ Low
Standard Terraform structure
⚠️ Medium
Understanding custom YAML structure required
⚠️ High
Understanding state references mandatory
Recommended Cases Single state management with few resources Single state management with subdirectory structure desired Subdirectory structure desired with tf files only

My personal view is that if you don't expect to have a large number of resources, starting with a single state file separation structure is best. This configuration is the simplest and follows Terraform's standard usage, keeping the learning curve low for team members. In the initial stages, file division by resource type such as databases.tf and schemas.tf is sufficiently manageable.

If you know from the beginning that the number of resources will increase, choosing a state separation structure would be better. While state separation makes dependency management more complex, it offers benefits like performance improvement and impact scope limitation in large-scale environments. If adopting state separation structure later, you'll need to execute the terraform mv command for each module.
https://dev.classmethod.jp/articles/terraform-move-module-state-to-other-dir/

For the YAML utilization structure, I believe the disadvantages of debugging complexity outweigh the benefits.
In conclusion, at this stage I think the approach of "starting simple and gradually increasing complexity as needed" is best.

Terragrunt is also an option. This tool specializes in multiple state management and is a valid choice if large-scale resource management (e.g., 100+ resources) is expected from the beginning or if the team has learning resources available.
https://zenn.dev/simpleform_blog/articles/20240701-multi-account-snowflake-with-terragrunt

Finally

The content described above is just my thinking at the time of writing, and I believe it may change after years of actual operation, so please consider it as reference.

Share this article

FacebookHatena blogX

Related articles