I tried Zero-ETL integration in a cross-account environment between DynamoDB and Amazon SageMaker Lakehouse
This page has been translated by machine translation. View original
Introduction
I'm kasama from the Data Business Division. In this article, I'd like to try Zero-ETL integration in a cross-account environment between DynamoDB and Amazon SageMaker Lakehouse.
Prerequisites

This is a configuration where DynamoDB and Glue Data Catalog exist in different accounts, with data integrated via Zero-ETL and output to S3, which can then be queried through Athena.
The target for Zero-ETL integration is specified as an AWS Glue Database. In the context of SageMaker Lakehouse, this functions as a "managed catalog with S3 as storage," where data is stored in Apache Iceberg format on S3 and can be accessed via Athena. By default, Zero-ETL integration is managed by IAM/AWS Glue policies, with Lake Formation being optional, so we'll use the default configuration.
We'll follow the configuration methods described in these documents:
CloudFormation Deployment
For implementation, we'll use IaC where possible, and AWS CLI for everything else.
First, let's define the source DynamoDB. For Zero-ETL integration, AWS Glue service needs to perform the following operations:
- Read the DynamoDB table structure
- Export data using the Point-in-Time Recovery feature
These permissions can be set directly using DynamoDB table's resource-based policy.
AWSTemplateFormatVersion: "2010-09-09"
Description: Source account resources for Glue Zero-ETL (DynamoDB table with PITR).
Parameters:
TableName:
Type: String
Default: cm-kasama-test-transactions
Description: DynamoDB table name (source)
Resources:
SourceTable:
Type: AWS::DynamoDB::Table
Properties:
TableName: !Ref TableName
BillingMode: PAY_PER_REQUEST
AttributeDefinitions:
- AttributeName: transaction_id
AttributeType: S
- AttributeName: timestamp
AttributeType: N
KeySchema:
- AttributeName: transaction_id
KeyType: HASH
- AttributeName: timestamp
KeyType: RANGE
PointInTimeRecoverySpecification:
PointInTimeRecoveryEnabled: true
ResourcePolicy:
PolicyDocument:
Version: '2012-10-17'
Statement:
- Sid: AllowGlueZeroETLFromIntegration
Effect: Allow
Principal:
Service: glue.amazonaws.com
Action:
- dynamodb:ExportTableToPointInTime
- dynamodb:DescribeTable
- dynamodb:DescribeExport
Resource: "*"
Condition:
StringEquals:
aws:SourceAccount: !Ref AWS::AccountId
StringLike:
aws:SourceArn: !Sub arn:aws:glue:${AWS::Region}:${AWS::AccountId}:integration:*
I deployed it using CloudFormation.

Next, let's define S3, IAM Role, and Glue Database on the target side.
The IAM Role policy is configured based on the following document.
AWSTemplateFormatVersion: "2010-09-09"
Description: Target account resources for Glue Zero-ETL (S3, Glue DB, IAM Role).
Parameters:
GlueDatabaseName:
Type: String
Default: cm_kasama_zero_etl_db
Description: Glue database name to receive data
S3BucketName:
Type: String
Description: S3 bucket to store data
TargetRoleName:
Type: String
Description: IAM role used by Glue on target side
Resources:
DataBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Ref S3BucketName
TargetRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Ref TargetRoleName
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service: glue.amazonaws.com
Action: sts:AssumeRole
Policies:
- PolicyName: glue-target-inline
PolicyDocument:
Version: "2012-10-17"
Statement:
- Sid: GlueCatalogAccess
Effect: Allow
Action:
- glue:GetDatabase
- glue:GetDatabases
- glue:GetTable
- glue:GetTables
- glue:CreateTable
- glue:UpdateTable
- glue:DeleteTable
- glue:CreatePartition
- glue:BatchCreatePartition
- glue:UpdatePartition
- glue:GetPartition
- glue:GetPartitions
Resource:
- !Sub arn:aws:glue:${AWS::Region}:${AWS::AccountId}:catalog
- !Sub arn:aws:glue:${AWS::Region}:${AWS::AccountId}:database/${GlueDatabaseName}
- !Sub arn:aws:glue:${AWS::Region}:${AWS::AccountId}:table/${GlueDatabaseName}/*
- Sid: S3Write
Effect: Allow
Action:
- s3:ListBucket
- s3:GetBucketLocation
- s3:PutObject
- s3:GetObject
- s3:DeleteObject
Resource:
- !Sub arn:aws:s3:::${S3BucketName}
- !Sub arn:aws:s3:::${S3BucketName}/*
- Sid: LogsAndMetrics
Effect: Allow
Action:
- cloudwatch:PutMetricData
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Resource: "*"
GlueDatabase:
Type: AWS::Glue::Database
Properties:
CatalogId: !Sub ${AWS::AccountId}
DatabaseInput:
Name: !Ref GlueDatabaseName
Description: Zero-ETL target database
LocationUri: !Sub s3://${S3BucketName}/
I also deployed this using CloudFormation.

Target Side Setup with AWS CLI
Now I'll proceed with setup using AWS CLI commands on CloudShell.
Let's start with the target side setup.
Please paste the following into CloudShell and run it:
export AWS_REGION=ap-northeast-1
# === Target: When running in Glue's CloudShell ===
# Set your account ID to TARGET_ACCOUNT_ID, manually enter source account ID (SOURCE_ACCOUNT_ID)
export TARGET_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export SOURCE_ACCOUNT_ID=<SOURCE_AWS_ACCOUNT_ID>
# Resource names (assuming common naming across accounts. Change as needed)
export DYNAMODB_TABLE=cm-kasama-test-transactions
export GLUE_DB_NAME=cm_kasama_zero_etl_db
export TARGET_ROLE=<IAM_ROLE_NAME>
export S3_BUCKET=<S3_BUCKET>
export INTEGRATION_NAME=cm-kasama-cross-account-dynamodb-glue
echo "A=${TARGET_ACCOUNT_ID} B=${SOURCE_ACCOUNT_ID} REGION=${AWS_REGION}"
Setting up Glue Resource-based Policy
Since Glue Resource-based policy isn't supported by CloudFormation, we'll use AWS CLI commands.
cat > catalog-resource-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowCreateInboundFromSourceAccount",
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::${SOURCE_ACCOUNT_ID}:root" },
"Action": "glue:CreateInboundIntegration",
"Resource": [
"arn:aws:glue:${AWS_REGION}:${TARGET_ACCOUNT_ID}:catalog",
"arn:aws:glue:${AWS_REGION}:${TARGET_ACCOUNT_ID}:database/${GLUE_DB_NAME}"
]
},
{
"Sid": "AllowGlueServiceAuthorize",
"Effect": "Allow",
"Principal": { "Service": "glue.amazonaws.com" },
"Action": "glue:AuthorizeInboundIntegration",
"Resource": [
"arn:aws:glue:${AWS_REGION}:${TARGET_ACCOUNT_ID}:catalog",
"arn:aws:glue:${AWS_REGION}:${TARGET_ACCOUNT_ID}:database/${GLUE_DB_NAME}"
]
}
]
}
EOF
aws glue put-resource-policy \
--region "${AWS_REGION}" \
--policy-in-json file://catalog-resource-policy.json
aws glue get-resource-policy --region "${AWS_REGION}"
AllowCreateInboundFromSourceAccount allows the Zero-ETL integration creator on the data source side to create integrations for the target Glue resources. We're specifying root here, but for more restrictions, you could specify the IAM user or role that will execute the aws glue create-integration command on the data source side.
AllowGlueServiceAuthorize allows the AWS Glue service itself (glue.amazonaws.com) to authorize integrations on behalf of the target account.
Setting up create-integration-resource-property
aws glue create-integration-resource-property \
--resource-arn arn:aws:glue:${AWS_REGION}:${TARGET_ACCOUNT_ID}:database/${GLUE_DB_NAME} \
--target-processing-properties RoleArn=arn:aws:iam::${TARGET_ACCOUNT_ID}:role/${TARGET_ROLE}
aws glue get-integration-resource-property \
--resource-arn arn:aws:glue:${AWS_REGION}:${TARGET_ACCOUNT_ID}:database/${GLUE_DB_NAME}
The create-integration-resource-property command configures the target resource (in this case, the Glue Database) for Zero-ETL integration. This command completes the permissions setup that allows the integration service to access the target Glue Database and write data to S3.
I'm using the AWS CLI because when trying to create a Zero-ETL integration in the source account management console, I couldn't link the target IAM Role at this time.
Source Side Setup with AWS CLI
Please paste the following into CloudShell and run it:
export AWS_REGION=ap-northeast-1
# === Source: When running in DynamoDB's CloudShell ===
# Set your account ID to SOURCE_ACCOUNT_ID, manually enter target account ID (TARGET_ACCOUNT_ID)
export SOURCE_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export TARGET_ACCOUNT_ID=<TARGET_ACCOUNT_ID> # ← Target (Glue) account ID
# Resource names (assuming common naming across accounts. Change as needed)
export DYNAMODB_TABLE=cm-kasama-test-transactions
export GLUE_DB_NAME=cm_kasama_zero_etl_db
export TARGET_ROLE=<IAM_ROLE_NAME>
export S3_BUCKET=<S3_BUCKET>
export INTEGRATION_NAME=cm-kasama-cross-account-dynamodb-glue
echo "TARGET_ACCOUNT_ID=${TARGET_ACCOUNT_ID} SOURCE_ACCOUNT_ID=${SOURCE_ACCOUNT_ID} REGION=${AWS_REGION}"
Inserting DynamoDB Data
Let's insert 3 test records using CLI:
cat > seed-items.json <<'EOF'
{
"cm-kasama-test-transactions": [
{"PutRequest": {"Item": {
"transaction_id": {"S": "txn-001"},
"timestamp": {"N": "1735300800"},
"user_id": {"S": "user_1"},
"amount": {"N": "5000"},
"currency": {"S": "USD"},
"status": {"S": "completed"},
"merchant": {"S": "merchant_1"},
"category": {"S": "shopping"},
"location": {"M": {"country": {"S": "US"}, "city": {"S": "New York"}}}
} }},
{"PutRequest": {"Item": {
"transaction_id": {"S": "txn-002"},
"timestamp": {"N": "1735301400"},
"user_id": {"S": "user_2"},
"amount": {"N": "3500"},
"currency": {"S": "EUR"},
"status": {"S": "pending"},
"merchant": {"S": "merchant_2"},
"category": {"S": "food"},
"location": {"M": {"country": {"S": "UK"}, "city": {"S": "London"}}}
} }},
{"PutRequest": {"Item": {
"transaction_id": {"S": "txn-003"},
"timestamp": {"N": "1735302000"},
"user_id": {"S": "user_3"},
"amount": {"N": "8000"},
"currency": {"S": "JPY"},
"status": {"S": "completed"},
"merchant": {"S": "merchant_3"},
"category": {"S": "transport"},
"location": {"M": {"country": {"S": "JP"}, "city": {"S": "Tokyo"}}}
} }}
]
}
EOF
aws dynamodb batch-write-item \
--region "${AWS_REGION}" \
--request-items file://seed-items.json
Setting up glue create-integration
aws glue create-integration \
--region "${AWS_REGION}" \
--integration-name "${INTEGRATION_NAME}" \
--source-arn arn:aws:dynamodb:${AWS_REGION}:${SOURCE_ACCOUNT_ID}:table/${DYNAMODB_TABLE} \
--target-arn arn:aws:glue:${AWS_REGION}:${TARGET_ACCOUNT_ID}:database/${GLUE_DB_NAME}
Now we actually create the Zero-ETL integration between DynamoDB and Glue using the create-integration command. We're not making any detailed configurations here, but for example, you could change the CDC interval from the default 15 minutes by specifying RefreshInterval in the --integration-config.
Results
I confirmed that the integration was successfully created from the data source account.

It can also be confirmed from the target account.


Two folders were created in S3.

cm_kasama_test_transactions contains the actual data from the DynamoDB table. zetl_integration_table_state contains the status of the Zero-ETL integration.

When queried with Athena, cm_kasama_test_transactions shows the actual values from the DynamoDB table.

In zetl_integration_table_state, we can see detailed status of the Zero-ETL integration.

Verifying Incremental Updates
Let's also verify incremental updates. As before, we'll manipulate data from the data source account's CloudShell.
export AWS_REGION=ap-northeast-1
export DYNAMODB_TABLE=cm-kasama-test-transactions
Update existing item (refund/amount change for txn-001)
NOW=$(date +%s)
cat > update_txn_001.json <<EOF
{
"TableName": "${DYNAMODB_TABLE}",
"Key": { "transaction_id": {"S": "txn-001"}, "timestamp": {"N": "1735300800"} },
"UpdateExpression": "SET #status = :s, #amount = :a, #updated_at = :u",
"ExpressionAttributeNames": { "#status": "status", "#amount": "amount", "#updated_at": "updated_at" },
"ExpressionAttributeValues": { ":s": {"S": "refunded"}, ":a": {"N": "5500"}, ":u": {"N": "${NOW}"} },
"ReturnValues": "ALL_NEW"
}
EOF
aws dynamodb update-item --region "${AWS_REGION}" --cli-input-json file://update_txn_001.json
Insert new item with arrays (txn-004)
cat > put_txn_004.json <<EOF
{
"TableName": "${DYNAMODB_TABLE}",
"Item": {
"transaction_id": {"S": "txn-004"},
"timestamp": {"N": "1735302600"},
"user_id": {"S": "user_1"},
"amount": {"N": "1200"},
"currency": {"S": "USD"},
"status": {"S": "completed"},
"merchant": {"S": "merchant_4"},
"category": {"S": "entertainment"},
"location": {"M": {"country": {"S": "US"}, "city": {"S": "Seattle"}}},
"payment_methods": {"L": [
{"S": "credit_card"},
{"S": "apple_pay"}
]},
"tags": {"SS": ["movie", "imax", "weekend"]}
}
}
EOF
aws dynamodb put-item --region "${AWS_REGION}" --cli-input-json file://put_txn_004.json
Delete existing item (delete txn-002)
cat > delete_txn_002.json <<EOF
{
"TableName": "${DYNAMODB_TABLE}",
"Key": { "transaction_id": {"S": "txn-002"}, "timestamp": {"N": "1735301400"} }
}
EOF
aws dynamodb delete-item --region "${AWS_REGION}" --cli-input-json file://delete_txn_002.json
I confirmed that the data was updated in DynamoDB around 9:44 on 2025/9/9.

I confirmed the updates in CloudWatch and S3 around 9:53 on 2025/9/9.


I also confirmed the updates to the cm_kasama_test_transactions table in Athena. Arrays are stored in their original state.

The zetl_integration_table_state table is also updated.

Conclusion
Since this is a relatively new service, there may be feature additions and specification changes in the future. Please consider this article as a reference as of September 2025.
