Try reading and writing Snowflake-managed Iceberg tables from each Amazon Athena engine

Try reading and writing Snowflake-managed Iceberg tables from each Amazon Athena engine

I verified the write functionality from external engines announced at Snowflake Summit 2026 using Amazon Athena. I confirmed the behavior with Athena Spark, Athena SQL, and multiple engine generations, and have also summarized the implementation considerations.
2026.07.05

This page has been translated by machine translation. View original

Hello, this is Kitagawa from the Data Business Division.

At Snowflake Summit 2026, the general availability (GA) of writes from external engines to Snowflake-managed Apache Iceberg tables was announced. Until now, Snowflake-managed Iceberg tables could be read from external engines, but writes were not supported. From now on, external engines such as Spark can directly commit to the same tables.

This time, I verified what operations are possible from Amazon Athena using Athena for Apache Spark (hereafter Athena Spark) and Athena SQL respectively. Also, since Athena for Apache Spark has two engines — PySpark engine version 3 and Apache Spark version 3.5 — I verified both of them. Let me summarize the conclusions first.

Engine Iceberg Version Read Write
PySpark engine version 3 v2 Success Success (INSERT/UPDATE/DELETE)
PySpark engine version 3 v3 Failed (client does not support v3) — (not tested due to read failure)
Apache Spark version 3.5 v2 Success Success (INSERT/UPDATE/DELETE)
Apache Spark version 3.5 v3 Success Partial success (INSERT/DELETE only. UPDATE failed due to row lineage not supported)
Athena SQL v2 Success Not possible
Athena SQL v3 Not possible (format itself not supported) Not possible (format itself not supported)

From Athena Spark, INSERT / UPDATE / DELETE to Iceberg v2 tables all succeeded, and I confirmed that bidirectional writes from both Snowflake and Athena coexist in a single snapshot history. PySpark engine version 3 does not support Iceberg v3 tables and reading was not possible, but Apache Spark version 3.5 succeeded in reading and partially succeeded in writing as well. Athena SQL only supported reading from Iceberg v2 tables.

Background of the Feature

External engine integration has been released in stages.

Date Content
2026-02-06 Read from external engines GA
2026-03-16 Write from external engines Preview
2026-05-07 Iceberg v3 support GA
2026-05-26 Write from external engines GA
2026-06-02 Announced at Snowflake Summit 26

The entry point for the connection is the Iceberg REST Catalog API (hereafter IRC) published by Snowflake Horizon Catalog. It is based on Apache Polaris, and the endpoint is available in each account from the start.

https://<account-identifier>.snowflakecomputing.com/polaris/api/catalog

https://docs.snowflake.com/en/user-guide/tables-iceberg-access-using-external-query-engine-snowflake-horizon

https://docs.snowflake.com/en/release-notes/2026/other/2026-03-16-tables-iceberg-query-using-external-query-engine-snowflake-horizon-writes-feature

Verification Scenario

Snowflake's officially supported engines are Apache Spark, DuckDB, Apache Flink, Trino, Dremio, and PyIceberg. Athena was not explicitly listed, but Athena has two engines with different situations.

  • Athena Spark: Since it is Apache Spark + Iceberg runtime, it falls under the officially supported Apache Spark. It can connect directly to Snowflake's IRC.
  • Athena SQL: Although it is Trino-based, the catalog is limited to via Glue and cannot connect directly to IRC. To connect, it is necessary to go through Glue Data Catalog catalog federation.

The verification was structured in three parts. First, Verification 1: Athena Spark (PySpark engine version 3), which is the officially supported path, proves the correctness of the Snowflake-side configuration and the behavior of the GA feature. Next, Verification 2: Athena Spark (Apache Spark version 3.5) confirms differences due to engine generation. Finally, Verification 3: Athena SQL + Glue catalog federation is attempted.

S3 access uses credential vending. The engine only needs to add the request header X-Iceberg-Access-Delegation: vended-credentials, and Snowflake issues scoped temporary credentials. In this verification, I deliberately did not grant S3 permissions to the data bucket for the Athena Spark execution role, making the write success itself proof that credential vending is functioning.

The verification tables have a written_by column, and cross-engine writes are visualized by having each engine write its own name.

Setup

The full procedures and scripts are in the repository, so I will only introduce the key points here.

https://github.com/cm-kitagawa-zempei/snowflake-managed-iceberg-tables-write-from-athena

1. S3 Bucket and External Volume

When creating an External Volume, a trust relationship is established to allow Snowflake's IAM user to AssumeRole to an IAM role in your own account.

https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume-s3

The procedure is as follows.

  1. Create an IAM role with a temporary trust policy
  2. CREATE EXTERNAL VOLUME (specifying a fixed value for STORAGE_AWS_EXTERNAL_ID makes subsequent steps easier)
  3. Check STORAGE_AWS_IAM_USER_ARN with DESC EXTERNAL VOLUME
  4. Update the trust policy to the real one
CREATE OR REPLACE EXTERNAL VOLUME ICEBERG_EXT_WRITE_VOL
  STORAGE_LOCATIONS = (
    (
      NAME = 'aws-s3-tokyo'
      STORAGE_PROVIDER = 'S3'
      STORAGE_BASE_URL = 's3://cm-example-iceberg-ext-write/iceberg/'
      STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/cm-example-extvol-role'
      STORAGE_AWS_EXTERNAL_ID = 'cm_example_iceberg_ext_write'
    )
  )
  ALLOW_WRITES = TRUE;

DESC EXTERNAL VOLUME ICEBERG_EXT_WRITE_VOL;

Connectivity verification is done with SYSTEM$VERIFY_EXTERNAL_VOLUME.

SELECT SYSTEM$VERIFY_EXTERNAL_VOLUME('ICEBERG_EXT_WRITE_VOL');
-- {"success":true,"writeResult":"PASSED","readResult":"PASSED",...}

2. Iceberg Tables

Both v2 and v3 were prepared. The external write GA covers both v2 and v3 Snowflake-managed tables.

-- ---- Iceberg v2 Table ----
CREATE OR REPLACE ICEBERG TABLE ORDERS_V2 (
  order_id   INT,
  product    STRING,
  quantity   INT,
  unit_price DECIMAL(10,2),
  ordered_at TIMESTAMP_NTZ(6),
  written_by STRING             -- Records the engine that wrote the data
)
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'ICEBERG_EXT_WRITE_VOL'
  BASE_LOCATION = 'orders_v2';

INSERT INTO ORDERS_V2 VALUES
  (1, 'keyboard', 2,  4500.00, '2026-07-01 09:00:00'::TIMESTAMP_NTZ, 'snowflake'),
  (2, 'mouse',    5,  1980.00, '2026-07-01 10:30:00'::TIMESTAMP_NTZ, 'snowflake'),
  (3, 'monitor',  1, 32000.00, '2026-07-01 14:15:00'::TIMESTAMP_NTZ, 'snowflake');

-- ---- Iceberg v3 Table ----
CREATE OR REPLACE ICEBERG TABLE ORDERS_V3 (
  order_id   INT,
  product    STRING,
  quantity   INT,
  unit_price DECIMAL(10,2),
  ordered_at TIMESTAMP_NTZ(6),
  written_by STRING             -- Records the engine that wrote the data
)
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'ICEBERG_EXT_WRITE_VOL'
  BASE_LOCATION = 'orders_v3'
  ICEBERG_VERSION = 3;

INSERT INTO ORDERS_V3 VALUES
  (1, 'desk',  1, 25000.00, '2026-07-01 09:00:00'::TIMESTAMP_NTZ, 'snowflake'),
  (2, 'chair', 2, 18000.00, '2026-07-01 11:45:00'::TIMESTAMP_NTZ, 'snowflake');

3. Service User and PAT

A service user and Programmatic Access Token (PAT) are prepared for the external engine. For writing, all of SELECT / INSERT / UPDATE / DELETE / TRUNCATE on the target table, as well as USAGE on the External Volume are required.

A network policy is in principle required for service user PATs, but Horizon IRC does not support user-level network policies. Therefore, in this verification, the requirements were relaxed using an authentication policy (this is a point that requires consideration in production).

CREATE AUTHENTICATION POLICY HORIZON_PAT_AUTH_POLICY
  PAT_POLICY = (NETWORK_POLICY_EVALUATION = ENFORCED_NOT_REQUIRED);

ALTER USER HORIZON_ATHENA_SVC SET AUTHENTICATION POLICY HORIZON_PAT_AUTH_POLICY;

ALTER USER HORIZON_ATHENA_SVC ADD PROGRAMMATIC ACCESS TOKEN ATHENA_VERIFICATION_PAT
  ROLE_RESTRICTION = 'HORIZON_EXT_RW_ROLE'
  DAYS_TO_EXPIRY = 7;

The PAT can be used as-is, or exchanged for an access token via the OAuth token endpoint.

curl -X POST "https://<account-identifier>.snowflakecomputing.com/polaris/api/catalog/v1/oauth/tokens" \
  --header 'Content-Type: application/x-www-form-urlencoded' \
  --data-urlencode 'grant_type=client_credentials' \
  --data-urlencode 'scope=session:role:HORIZON_EXT_RW_ROLE' \
  --data-urlencode 'client_secret=<PAT>'

Verification 1: Athena Spark (PySpark engine version 3)

Connection Configuration

When starting an Athena Spark session, define an Iceberg REST catalog pointing to Snowflake's IRC in the Spark properties. Here is the final working configuration.

{
  "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
  "spark.sql.catalog.horizon": "org.apache.iceberg.spark.SparkCatalog",
  "spark.sql.catalog.horizon.type": "rest",
  "spark.sql.catalog.horizon.uri": "https://<account-identifier>.snowflakecomputing.com/polaris/api/catalog",
  "spark.sql.catalog.horizon.warehouse": "ICEBERG_EXT_WRITE_DB",
  "spark.sql.catalog.horizon.credential": "<PAT>",
  "spark.sql.catalog.horizon.scope": "session:role:HORIZON_EXT_RW_ROLE",
  "spark.sql.catalog.horizon.header.X-Iceberg-Access-Delegation": "vended-credentials",
  "spark.sql.catalog.horizon.io-impl": "org.apache.iceberg.aws.s3.S3FileIO",
  "spark.sql.iceberg.handle-timestamp-without-timezone": "true",
  "spark.sql.iceberg.vectorization.enabled": "false",
  "spark.sql.shuffle.partitions": "1"
}

Specify the Snowflake database name for warehouse. Spark can then access it in the format horizon.<schema>.<table>.

Note that the verification was executed from the CLI using aws athena start-session / start-calculation-execution rather than the management console notebook, in a reproducible form.

The last three lines of the configuration were not known from the start, but were added one by one as problems were encountered during verification.

Operational Verification

The verification script (verify_horizon_write.py) confirms operation in the following flow.

  1. Output environment information (Spark version, Iceberg version)
  2. Confirm that Snowflake-side schemas are visible with SHOW NAMESPACES IN horizon
  3. Read the initial data (written_by:snowflake) from the v2 table
  4. INSERT into the v2 table
  5. UPDATE the v2 table
  6. DELETE from the v2 table
  7. Re-read the v2 table after writing and confirm row counts by written_by
  8. Reference the snapshot history (.snapshots metadata table) of the v2 table and confirm external commits
  9. Read the v3 table
  10. INSERT into the v3 table

For the Iceberg v3 table, an error Cannot read unsupported version 3 occurred at step 9 during reading. The INSERT verification was also performed as a precaution, but the result was the same, so UPDATE / DELETE verification was not performed.

Execution log excerpt
============================================================
### 0. Environment Information
============================================================
Spark version: 3.2.1-amzn-0
Iceberg version: 1.4.0-SNAPSHOT

============================================================
### 1. Namespace list (confirming Snowflake schemas are visible)
============================================================
+---------+
|namespace|
+---------+
|ANALYTICS|
+---------+

============================================================
### 2. Read v2 table (initial data = written_by:snowflake)
============================================================
+--------+--------+--------+----------+-------------------+----------+
|ORDER_ID|PRODUCT |QUANTITY|UNIT_PRICE|ORDERED_AT         |WRITTEN_BY|
+--------+--------+--------+----------+-------------------+----------+
|1       |keyboard|2       |4500.00   |2026-07-01 09:00:00|snowflake |
|2       |mouse   |5       |1980.00   |2026-07-01 10:30:00|snowflake |
|3       |monitor |1       |32000.00  |2026-07-01 14:15:00|snowflake |
+--------+--------+--------+----------+-------------------+----------+


============================================================
### 3. INSERT into v2 table (main verification of external engine writes)
============================================================
INSERT succeeded

============================================================
### 4. UPDATE v2 table
============================================================
UPDATE succeeded

============================================================
### 5. DELETE from v2 table (delete only 1 row, leave 201 for Snowflake-side verification)
============================================================
DELETE succeeded

============================================================
### 6. Read v2 table (after writing)
============================================================
+--------+--------+--------+----------+-------------------+------------+
|ORDER_ID|PRODUCT |QUANTITY|UNIT_PRICE|ORDERED_AT         |WRITTEN_BY  |
+--------+--------+--------+----------+-------------------+------------+
|1       |keyboard|2       |4500.00   |2026-07-01 09:00:00|snowflake   |
|2       |mouse   |5       |1980.00   |2026-07-01 10:30:00|snowflake   |
|3       |monitor |1       |32000.00  |2026-07-01 14:15:00|snowflake   |
|201     |ssd-1tb |2       |12800.00  |2026-07-02 10:00:00|athena-spark|
+--------+--------+--------+----------+-------------------+------------+

+------------+---------+
|written_by  |row_count|
+------------+---------+
|athena-spark|1        |
|snowflake   |3        |
+------------+---------+


============================================================
### 7. Snapshot history of v2 table (confirming external commits)
============================================================
+-----------------------+-------------------+---------+
|committed_at           |snapshot_id        |operation|
+-----------------------+-------------------+---------+
|2026-07-04 12:47:05.39 |7325365669594828787|append   |
|2026-07-04 12:49:41.525|3821238997686183694|append   |
|2026-07-04 12:49:46.974|1167080387288909088|overwrite|
|2026-07-04 12:49:50.239|8203787599929521889|delete   |
+-----------------------+-------------------+---------+


============================================================
### 8. Read v3 table (verification of v3 support on Athena Spark side)
============================================================
v3 read failed: An error occurred while calling o44.sql.
: org.apache.iceberg.exceptions.RESTException: Received a success response code of 200, but failed to parse response body into LoadTableResponse
        ...(97 lines of stack trace omitted)...
Caused by: org.apache.iceberg.shaded.com.fasterxml.jackson.databind.JsonMappingException: Cannot read unsupported version 3 (through reference chain: org.apache.iceberg.rest.responses.LoadTableResponse["metadata"])
        ...(11 lines of stack trace omitted)...
Caused by: java.lang.IllegalArgumentException: Cannot read unsupported version 3
        ...(8 lines of stack trace omitted)...


============================================================
### 9. INSERT into v3 table (verification of v3 external write capability)
============================================================
v3 INSERT failed: An error occurred while calling o44.sql.
: org.apache.iceberg.exceptions.RESTException: Received a success response code of 200, but failed to parse response body into LoadTableResponse
        ...(79 lines of stack trace omitted)...
Caused by: org.apache.iceberg.shaded.com.fasterxml.jackson.databind.JsonMappingException: Cannot read unsupported version 3 (through reference chain: org.apache.iceberg.rest.responses.LoadTableResponse["metadata"])
        ...(11 lines of stack trace omitted)...
Caused by: java.lang.IllegalArgumentException: Cannot read unsupported version 3
        ...(8 lines of stack trace omitted)...

When the session information was output, the PySpark engine version 3 runtime was Spark 3.2.1 + Iceberg 1.4.0 (as of July 2026). Due to the age of this runtime, three problems were encountered in sequence. The problems and solutions are documented below.

Cannot read NTZ timestamps

The initial SELECT failed.

pyspark.sql.utils.IllegalArgumentException: Cannot handle timestamp without timezone fields in Spark.
Spark does not natively support this type but if you would like to handle all timestamps
as timestamp with timezone set 'spark.sql.iceberg.handle-timestamp-without-timezone' to true.

Since the TIMESTAMP_NTZ type does not exist in Spark 3.2 (introduced in 3.4), Iceberg's timestamp (without timezone) columns cannot be handled. This was resolved by setting spark.sql.iceberg.handle-timestamp-without-timezone = true as directed by the error message. The stored values do not change; only the interpretation on the Spark side changes.

Since the TIMESTAMP_NTZ type was introduced in Spark 3.4, this setting itself is no longer needed in subsequent Iceberg integrations.

https://github.com/apache/iceberg/blob/apache-iceberg-1.4.0/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java

https://spark.apache.org/releases/spark-release-3-4-0.html

Cannot read Parquet written by Snowflake

The data file read failed.

java.lang.UnsupportedOperationException: Cannot support vectorized reads for column [ORDER_ID]
optional int32 ORDER_ID (INTEGER(32,true)) = 1 with encoding DELTA_BINARY_PACKED.
Disable vectorized reads to read this table/file

Parquet files written by Snowflake include DELTA_BINARY_PACKED encoding (Parquet V2 series), which the vectorized reader (Arrow) in Iceberg 1.4 does not support. This was resolved by falling back to row-based reading with spark.sql.iceberg.vectorization.enabled = false. Performance is reduced but there is no impact on verification. This setting is documented in the Iceberg official documentation under Spark Configuration (Spark SQL Options).

https://iceberg.apache.org/docs/latest/spark-configuration/#spark-sql-options

https://qiita.com/manabian/items/fc89b81b327545da9f69

UPDATE fails with 403 on s3:DeleteObject

INSERT succeeded, but UPDATE failed.

software.amazon.awssdk.services.s3.model.S3Exception:
User: arn:aws:sts::123456789012:assumed-role/cm-example-extvol-role/snowflake
is not authorized to perform: s3:DeleteObject on resource: ".../orders_v2.XXXX/data/00002-....parquet"
because no session policy allows the s3:DeleteObject action

Since the error subject is assumed-role/.../snowflake, credential vending itself is functioning and temporary credentials with Snowflake assuming the External Volume role are being used. The problem is at the end no session policy allows — although DeleteObject has been granted on the IAM role side, the session policy applied by Snowflake during vending only includes Put/Get/List and does not include Delete.

Meanwhile, Iceberg's Spark writer may issue DeleteObject when cleaning up empty data files that were closed with zero rows or during task interruption cleanup. Physical deletion of committed data files is the role of Snowflake-side maintenance, so this appears to be an intentional scope restriction, but there are cases where it conflicts with the writer's internal behavior. This behavior was not found in the documentation.

Two countermeasures can be considered.

  1. Prevent empty files from being generated: Copy-on-write write tasks are distributed per data file to be rewritten, so if the number of target files is less than the number of write tasks, zero-row tasks, i.e., empty files, will be generated. This is unrelated to row count — I confirmed through actual measurement that the same 403 occurs even with a table of 20,000 rows that Snowflake wrote to a single file. Setting spark.sql.shuffle.partitions = 1 to consolidate write tasks into one avoids the trigger for issuing DeleteObject in the first place. This resolved the issue in this case.
  2. Allow directly via bucket policy: Grant s3:DeleteObject directly to the session principal (arn:aws:sts::...:assumed-role/<role-name>/snowflake) using a resource-based policy. This utilizes the IAM specification that direct grants to session principals are not subject to session policy restrictions. However, since this relaxes the scope that Snowflake intentionally restricts, use this approach with careful consideration.

The behavior that "write tasks are distributed per file to be rewritten," which is the premise for countermeasure 1, can be confirmed in the Iceberg 1.4.0 source. The distribution mode for copy-on-write UPDATE / DELETE is hash by default (SparkWriteConf), and the clustering key for hash distribution is the _file metadata column representing the source data file of the row (SparkDistributionAndOrderingUtil).

https://github.com/apache/iceberg/blob/apache-iceberg-1.4.0/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java

https://github.com/apache/iceberg/blob/apache-iceberg-1.4.0/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkDistributionAndOrderingUtil.java

v2 Table

After the fixes, the execution succeeded for read → INSERT → UPDATE → DELETE. Here is the snapshot history.

+-----------------------+-------------------+---------+
|committed_at           |snapshot_id        |operation|
+-----------------------+-------------------+---------+
|2026-07-04 12:47:05.39 |7325365669594828787|append   |  <- Snowflake: initial data INSERT
|2026-07-04 12:49:41.525|3821238997686183694|append   |  <- Athena Spark: INSERT
|2026-07-04 12:49:46.974|1167080387288909088|overwrite|  <- Athena Spark: UPDATE
|2026-07-04 12:49:50.239|8203787599929521889|delete   |  <- Athena Spark: DELETE
+-----------------------+-------------------+---------+

Following the initial data created by Snowflake (the first append), it can be seen that Athena Spark's commits are accumulated in the same Iceberg history.
The external engine writes are also immediately visible from the Snowflake side.

SELECT written_by, COUNT(*) AS row_count FROM ORDERS_V2 GROUP BY written_by;
-- WRITTEN_BY    ROW_COUNT
-- athena-spark  1
-- snowflake     3

v3 Table

The v3 table failed at the read stage.

Caused by: java.lang.IllegalArgumentException: Cannot read unsupported version 3
	at org.apache.iceberg.TableMetadataParser.fromJson(...)

Snowflake (server side) returns metadata with HTTP 200, and the Athena Spark Iceberg 1.4.0 client rejects it at the parsing stage. This means it is not a Snowflake-side restriction, but rather the engine-side client's lack of v3 support (v3 support is available from Iceberg 1.8 onwards).

Verification 2: Athena Spark (Apache Spark version 3.5)

The Apache Spark version 3.5 engine is based on EMR 7.12 with Spark 3.5.6 + Iceberg 1.10.0, which is significantly newer than the Verification 1 runtime (Spark 3.2.1 + Iceberg 1.4.0), so I also confirmed whether the issues encountered in Verification 1 would be resolved.

First, as a premise, Apache Spark version 3.5 has a different execution method from Verification 1. The management console notebook and start-calculation-execution used in Verification 1 cannot be used, and the entry point for code execution is Spark Connect. Since existing workgroup engine changes are also not possible, a dedicated new workgroup was created.

https://docs.aws.amazon.com/athena/latest/ug/notebooks-spark-release-versions.html

Connecting via Spark Connect

Start a session, obtain the endpoint URL and authentication token via the GetSessionEndpoint API, and connect from a local pyspark[connect] client.

athena = boto3.client("athena", region_name="ap-northeast-1")
session_id = athena.start_session(
    WorkGroup="cm-example-spark35-wg",
    EngineConfiguration={"MaxConcurrentDpus": 20},
)["SessionId"]
# (Wait until session becomes IDLE)

resp = athena.get_session_endpoint(SessionId=session_id)
url = (resp["EndpointUrl"].replace("https", "sc", 1)
       + f":443/;use_ssl=true;x-aws-proxy-auth={resp['AuthToken']}")

spark = SparkSession.builder.remote(url).getOrCreate()

Since SparkProperties cannot be used with StartSession, the Horizon catalog definition is injected from the client side after connecting.

spark.conf.set("spark.sql.catalog.horizon", "org.apache.iceberg.spark.SparkCatalog")
spark.conf.set("spark.sql.catalog.horizon.type", "rest")
# ...subsequent catalog settings same as Verification 1, configured via spark.conf.set

Operational Verification

The verification script (verify_horizon_write_35.py) follows the same flow as Verification 1, but intentionally removes the three workarounds that were required in Verification 1 (NTZ timestamps, vectorized reads, shuffle partitions) to confirm how much is resolved by Iceberg 1.10.0. Additionally, for v3 tables, UPDATE / DELETE operations that failed at read time in Verification 1 and could not be tested are now also verified.

  1. Output environment information (Spark version. Since the Iceberg version on the JVM side cannot be obtained from the Spark Connect client, it is recorded as 1.10.0, the documented value bundled with EMR 7.12, which is the engine base)
  2. Confirm that Snowflake-side schemas are visible via SHOW NAMESPACES IN horizon
  3. Read from v2 table
  4. INSERT into v2 table
  5. UPDATE v2 table
  6. DELETE from v2 table
  7. Re-read v2 table after writing and check row count by written_by
  8. Reference snapshot history of v2 table and confirm external commits
  9. Read from v3 table
  10. INSERT into v3 table
  11. UPDATE v3 table
  12. DELETE from v3 table
Execution Log
============================================================
### 0. Environment Information
============================================================
Spark version: 3.5.6-amzn-1
Iceberg version: 1.10.0 (Bundled with EMR 7.12 · documented value)

============================================================
### 1. Namespace List (Snowflake schemas should be visible)
============================================================
+---------+
|namespace|
+---------+
|ANALYTICS|
+---------+

============================================================
### 2. Read from v2 table (no NTZ workaround)
============================================================
+----------+-------------+-------+
|col_name  |data_type    |comment|
+----------+-------------+-------+
|ORDER_ID  |int          |NULL   |
|PRODUCT   |string       |NULL   |
|QUANTITY  |int          |NULL   |
|UNIT_PRICE|decimal(10,2)|NULL   |
|ORDERED_AT|timestamp_ntz|NULL   |
|WRITTEN_BY|string       |NULL   |
+----------+-------------+-------+

+--------+--------+--------+----------+-------------------+------------+
|ORDER_ID|PRODUCT |QUANTITY|UNIT_PRICE|ORDERED_AT         |WRITTEN_BY  |
+--------+--------+--------+----------+-------------------+------------+
|1       |keyboard|2       |4500.00   |2026-07-01 09:00:00|snowflake   |
|2       |mouse   |5       |1980.00   |2026-07-01 10:30:00|snowflake   |
|3       |monitor |1       |32000.00  |2026-07-01 14:15:00|snowflake   |
|201     |ssd-1tb |2       |12800.00  |2026-07-02 10:00:00|athena-spark|
+--------+--------+--------+----------+-------------------+------------+


============================================================
### 3. INSERT into v2 table
============================================================
INSERT succeeded

============================================================
### 4. UPDATE v2 table (no shuffle.partitions=1)
============================================================
UPDATE succeeded

============================================================
### 5. DELETE from v2 table
============================================================
DELETE succeeded

============================================================
### 6. Read from v2 table (after writing)
============================================================
+--------+--------+--------+----------+-------------------+--------------------+
|ORDER_ID|PRODUCT |QUANTITY|UNIT_PRICE|ORDERED_AT         |WRITTEN_BY          |
+--------+--------+--------+----------+-------------------+--------------------+
|1       |keyboard|2       |4500.00   |2026-07-01 09:00:00|snowflake           |
|2       |mouse   |5       |1980.00   |2026-07-01 10:30:00|snowflake           |
|3       |monitor |1       |32000.00  |2026-07-01 14:15:00|snowflake           |
|201     |ssd-1tb |2       |12800.00  |2026-07-02 10:00:00|athena-spark        |
|301     |usb-hub |2       |3400.00   |2026-07-02 11:00:00|athena-spark-connect|
+--------+--------+--------+----------+-------------------+--------------------+

+--------------------+---------+
|written_by          |row_count|
+--------------------+---------+
|athena-spark        |1        |
|athena-spark-connect|1        |
|snowflake           |3        |
+--------------------+---------+


============================================================
### 7. Snapshot history of v2 table
============================================================
+-----------------------+-------------------+---------+
|committed_at           |snapshot_id        |operation|
+-----------------------+-------------------+---------+
|2026-07-04 12:47:05.39 |7325365669594828787|append   |
|2026-07-04 12:49:41.525|3821238997686183694|append   |
|2026-07-04 12:49:46.974|1167080387288909088|overwrite|
|2026-07-04 12:49:50.239|8203787599929521889|delete   |
|2026-07-04 12:52:21.215|6867292432401266181|append   |
|2026-07-04 12:52:25.633|2458057220796135436|overwrite|
|2026-07-04 12:52:28.405|8848674517559331220|delete   |
+-----------------------+-------------------+---------+


============================================================
### 8. Read from v3 table (v3 support verification with Iceberg 1.10)
============================================================
+--------+-------+--------+----------+-------------------+----------+
|ORDER_ID|PRODUCT|QUANTITY|UNIT_PRICE|ORDERED_AT         |WRITTEN_BY|
+--------+-------+--------+----------+-------------------+----------+
|1       |desk   |1       |25000.00  |2026-07-01 09:00:00|snowflake |
|2       |chair  |2       |18000.00  |2026-07-01 11:45:00|snowflake |
+--------+-------+--------+----------+-------------------+----------+

v3 read succeeded

============================================================
### 9. INSERT into v3 table
============================================================
v3 INSERT succeeded
+--------+--------+--------+----------+-------------------+--------------------+
|ORDER_ID|PRODUCT |QUANTITY|UNIT_PRICE|ORDERED_AT         |WRITTEN_BY          |
+--------+--------+--------+----------+-------------------+--------------------+
|1       |desk    |1       |25000.00  |2026-07-01 09:00:00|snowflake           |
|2       |chair   |2       |18000.00  |2026-07-01 11:45:00|snowflake           |
|111     |desk-mat|1       |2200.00   |2026-07-02 11:10:00|athena-spark-connect|
+--------+--------+--------+----------+-------------------+--------------------+


============================================================
### 10. UPDATE v3 table (row-level update)
============================================================
FAILED: (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in stage 21.0 failed 4 times, most recent failure: Lost task 0.3 in stage 21.0 (TID 20) ([2406:da14:910:b315:4dba:bb4:cd82:78bd] executor 1): java.lang.IllegalArgumentException: Structs do not match: StructType(StructField(ORDER_ID,IntegerType,true),StructField(PRODUCT,StringType,true),StructField(QUANTITY,IntegerType,true),StructField(UNIT_PRICE,DecimalType(10,2),true),StructField(ORDERED_AT,TimestampNTZType,true),StructField(WRITTEN_BY,StringType,true)) and message table {
  optional int32 ORDER_ID = 1;
  optional binary PRODUCT (STRING) = 2;
  optional int32 QUANTITY = 3;
  optional int64 UNIT_PRICE (DECIMAL(10,2)) = 4;
  optional int64 ORDERED_AT (TIMESTAMP(MICROS,false)) = 5;
  optional binary WRITTEN_BY (STRING) = 6;
  optional int64 _row_id = 2147483540;
  optional int64 _last_updated_sequence_number = 2147483539;
}

        ...(13 lines of stack trace omitted)...

============================================================
### 11. DELETE from v3 table (row-level delete / deletion vectors)
============================================================
v3 DELETE succeeded
+--------+-------+--------+----------+-------------------+----------+
|ORDER_ID|PRODUCT|QUANTITY|UNIT_PRICE|ORDERED_AT         |WRITTEN_BY|
+--------+-------+--------+----------+-------------------+----------+
|1       |desk   |1       |25000.00  |2026-07-01 09:00:00|snowflake |
|2       |chair  |2       |18000.00  |2026-07-01 11:45:00|snowflake |
+--------+-------+--------+----------+-------------------+----------+

v2 Table

Reads and INSERT / UPDATE / DELETE operations on the v2 table all succeeded. The necessity of workarounds that were required in PySpark engine version 3 is as follows.

Verification 1 (Spark 3.2.1 + Iceberg 1.4.0) Verification 2 (Spark 3.5.6 + Iceberg 1.10.0)
handle-timestamp-without-timezone Required Not required (native TIMESTAMP_NTZ support)
vectorization.enabled = false Required Still required
shuffle.partitions = 1 (DeleteObject 403 workaround) Required Not required (403 not reproduced)

Only the vectorized read issue remains. However, the content of the error has changed.

java.lang.UnsupportedOperationException: Cannot support vectorized reads for column [PRODUCT]
optional binary PRODUCT (STRING) = 2 with encoding DELTA_LENGTH_BYTE_ARRAY.
Disable vectorized reads to read this table/file

In Verification 1, the failure occurred with DELTA_BINARY_PACKED on integer columns, which has been resolved in Iceberg 1.10. Now it hits DELTA_LENGTH_BYTE_ARRAY, which Snowflake uses for string columns. This is reportedly supported from Iceberg 1.11, so the expectation is that it will be resolved with one more generation of runtime updates.

v3 Table

The v3 table could not be read in Verification 1, but Iceberg 1.10.0 successfully reads v3 tables, and the inserted rows are also visible.

+--------+--------+--------+----------+-------------------+--------------------+
|ORDER_ID|PRODUCT |QUANTITY|UNIT_PRICE|ORDERED_AT         |WRITTEN_BY          |
+--------+--------+--------+----------+-------------------+--------------------+
|1       |desk    |1       |25000.00  |2026-07-01 09:00:00|snowflake           |
|2       |chair   |2       |18000.00  |2026-07-01 11:45:00|snowflake           |
|111     |desk-mat|1       |2200.00   |2026-07-02 11:10:00|athena-spark-connect|
+--------+--------+--------+----------+-------------------+--------------------+

DELETE also succeeded. However, only UPDATE fails.

java.lang.IllegalArgumentException: Structs do not match:
StructType(StructField(ORDER_ID,IntegerType,true), ..., StructField(WRITTEN_BY,StringType,true))
and message table {
  optional int32 ORDER_ID = 1;
  ...
  optional binary WRITTEN_BY (STRING) = 6;
  optional int64 _row_id = 2147483540;
  optional int64 _last_updated_sequence_number = 2147483539;
}

_row_id and _last_updated_sequence_number appear at the end of the write schema. These are row lineage metadata columns that became mandatory in Iceberg v3, and the Iceberg integration for Spark 3.5 cannot yet handle rewriting rows that include these columns. DELETE succeeds because it only writes a deletion marker (deletion vectors) on existing rows without rewriting them, while UPDATE fails because modified rows need to be rewritten with lineage columns included. Row lineage support is coming from the Spark 4.x series of Iceberg integration, so full read/write support for v3 is still a bit further off.

Verification 3: Athena SQL + Glue Catalog Federation

Next is read/write from Athena SQL. Since Athena SQL cannot connect directly to IRC, Glue Data Catalog federation is used as an intermediary.

https://docs.aws.amazon.com/lake-formation/latest/dg/catalog-federation-snowflake.html

Setup Key Points

The setup flow is as follows.

  1. Exchange PAT for an access token and store it in Secrets Manager in the format {"BEARER_TOKEN": "<token>"}
  2. Create an IAM role for Glue / Lake Formation (secret read access + S3 access for table data)
  3. Create a Glue connection (SNOWFLAKEICEBERGRESTCATALOG · CUSTOM authentication)
  4. Register the connection as a resource in Lake Formation (--with-federation --with-privileged-access)
  5. Create a federated catalog (specify the Snowflake database name as the Identifier)
  6. Grant table permissions to the query-executing principal in Lake Formation

The Bearer token stored in Secrets Manager expires in approximately one hour, so a token auto-renewal mechanism is required for long-term operation.

Results

Reads succeeded. Rows written by Athena Spark in Verification 1 and Verification 2 are also visible through the federation.

SELECT written_by, COUNT(*) AS row_count
FROM "snowflake_horizon_cat"."analytics"."orders_v2"
GROUP BY written_by;
-- athena-spark          1
-- athena-spark-connect  1
-- snowflake             3

On the other hand, INSERT failed with a permission error.

com.amazonaws.services.lakeformation.model.AccessDeniedException:
Principal does not have any privilege on specified resource

While it appears to be failing due to a Lake Formation error, granting all permissions at the database level in addition to table permissions (SELECT / INSERT / DELETE / ALTER / DESCRIBE) did not change the result.

Both AWS and Snowflake official documentation were reviewed, but no explicit statement was found saying "writes via Glue catalog federation are not supported by design." Both companies' documentation and blogs consistently refer only to "queries (reads)" and there is no mention of writes at all.

https://docs.aws.amazon.com/lake-formation/latest/dg/catalog-federation-snowflake.html

https://aws.amazon.com/blogs/big-data/access-snowflake-horizon-catalog-data-using-catalog-federation-in-the-aws-glue-data-catalog/

https://docs.snowflake.com/en/user-guide/tables-iceberg-access-using-external-query-engine-snowflake-horizon

v3 Table: Also Not Possible on the Engine Side

The v3 table failed with the same error for both reads and writes.

GENERIC_INTERNAL_ERROR: Iceberg format version 3 is not supported

Athena SQL returns a clear message indicating v3 is not supported. As of July 2026, the only way to access Iceberg v3 tables from Athena is through the Apache Spark version 3.5 engine confirmed in Verification 2.

Summary

Here is a summary of the verifications conducted so far.

  • With Athena Spark, INSERT / UPDATE / DELETE operations to Snowflake-managed Iceberg tables (v2) all succeeded, confirming the GA of external engine writes. This works on both PySpark engine version 3 and Apache Spark version 3.5 engines.
  • Commits from both engines coexist in a single snapshot history, and each engine's writes are immediately visible to the other.
  • Thanks to credential vending, reads and writes were possible without granting S3 permissions to the Iceberg data area to the engine-side execution role.
  • There were three issues caused by PySpark engine version 3 (Spark 3.2.1 + Iceberg 1.4.0): NTZ timestamps, DELTA_BINARY_PACKED, and the issue where vended credentials do not include s3:DeleteObject. All are avoidable via session configuration, and with Apache Spark version 3.5 (Spark 3.5.6 + Iceberg 1.10.0), only the vectorized read issue remains (the target encoding has changed to DELTA_LENGTH_BYTE_ARRAY).
  • Iceberg v3 tables are not available for reading or writing in PySpark engine version 3 or Athena SQL, but with Apache Spark version 3.5, reads, INSERT, and DELETE have been confirmed to work. UPDATE is expected to be resolved in the Spark 4.x series, which will include row lineage support.
  • Athena SQL + Glue catalog federation is read-only. Writes fail with a permission-error-like message due to Lake Formation vending rejection. While there is no explicit statement in the official documentation, the results of hands-on verification suggest this is a by-design limitation.

When writing to Snowflake-managed Iceberg tables from the AWS side, the current approach is to connect directly to Horizon IRC from a Spark-based engine. For Athena, the usage pattern would be PySpark engine version 3 for easy use via the console notebook, and Apache Spark version 3.5 + Spark Connect when v3 tables or newer runtimes are needed. It seems best to treat federation as a mechanism for read integration and use direct IRC connections for writes.


Snowflakeの導入支援はクラスメソッドに!

クラスメソッドでは Snowflake の導入を支援しております。
製品の詳細や支援の内容についてお気軽にお問い合わせください。

Snowflakeの詳細を見る

Share this article