I tried CRUDing the new S3 Annotations feature and performing cross-search with Annotation Table + Athena
This page has been translated by machine translation. View original
Introduction
On June 16, 2026, a new Amazon S3 feature called "S3 Annotations" was announced. It is a feature that allows you to attach up to 1,000 custom metadata items (annotations) to S3 objects, each up to 1 MB in size.
A comparison with conventional S3 metadata is shown below.
| Item | user-defined metadata | object tags | S3 Annotations |
|---|---|---|---|
| Max count | — | 10 | 1,000 |
| Size limit | 2KB within request header | Key max 128 chars, Value max 256 chars | 1MB each |
| Format | Key-Value (ASCII) | Key-Value | Any (JSON, text, etc.) |
| Update | Object re-PUT required | Individual update possible | Individual update possible |
| Lifecycle integration | — | ✅ | — |
| Access control integration | — | ✅ (condition keys) | — |
| Cross-search (Athena) | — | Via S3 Inventory | Annotation Table |
Tags are a mechanism used for lifecycle rules and IAM condition key integration, and Annotations are not a superset of these.
This article covers CRUD operations for Annotations using boto3 and the AWS CLI, enables the Annotation Table, and performs cross-search using Athena.
Verification Environment
| Item | Value |
|---|---|
| Region | ap-northeast-1 |
| Bucket type | General-purpose S3 bucket (versioning disabled) |
| Python | 3.14 (rc1) |
| boto3 | 1.43.31 (released 2026-06-16) |
| botocore | 1.43.31 |
| AWS CLI | v2.35.6 (released 2026-06-17) |
| Athena | Engine version 3 |
The operations in this article are performed using boto3 and the AWS CLI.
Annotation CRUD Operations (boto3)
Annotation-related methods available in boto3 1.43.31:
import boto3
s3 = boto3.client('s3', region_name='ap-northeast-1')
[m for m in dir(s3) if 'annot' in m.lower()]
# ['delete_object_annotation', 'get_object_annotation', 'list_object_annotations',
# 'put_object_annotation', 'update_bucket_metadata_annotation_table_configuration']
The following operations are performed with the test object test-object.txt already placed in the bucket.
import json
BUCKET = "my-annotation-demo-bucket"
KEY = "test-object.txt"
PutObjectAnnotation (JSON)
annotation_json = json.dumps({
"project": "annotation-test",
"owner": "demo-user",
"created": "2026-06-17"
})
resp = s3.put_object_annotation(
Bucket=BUCKET,
Key=KEY,
AnnotationName='test-metadata',
AnnotationPayload=annotation_json.encode()
)
{
'ETag': '"1fa459dad748f9fcc3be1e3dcc50ea82"',
'Key': 'test-object.txt',
'AnnotationName': 'test-metadata',
'ResponseMetadata': {
'RequestId': 'XXXXXXXXXXXX',
'HostId': 'XXXXXXXXXXXX',
'HTTPStatusCode': 200
}
}
ResponseMetadata will be omitted hereafter.
PutObjectAnnotation (Plain Text)
resp = s3.put_object_annotation(
Bucket=BUCKET,
Key=KEY,
AnnotationName='ai-summary',
AnnotationPayload=b'AI-generated summary: A test file for demonstrating S3 Annotations.'
)
{
'ETag': '"403c26f2a55cdc54cf931b03be006b75"',
'AnnotationName': 'ai-summary'
}
ListObjectAnnotations
resp = s3.list_object_annotations(Bucket=BUCKET, Key=KEY)
{
'AnnotationCount': 2,
'Annotations': [
{
'AnnotationName': 'ai-summary',
'Size': 67,
'ETag': '"403c26f2a55cdc54cf931b03be006b75"',
'LastModified': datetime(2026, 6, 17, 1, 37, 36, tzinfo=tzutc()),
'ChecksumAlgorithm': ['CRC32']
},
{
'AnnotationName': 'test-metadata',
'Size': 78,
'ETag': '"1fa459dad748f9fcc3be1e3dcc50ea82"',
'LastModified': datetime(2026, 6, 17, 1, 37, 36, tzinfo=tzutc()),
'ChecksumAlgorithm': ['CRC32']
}
]
}
The ETag returned by List is not the ETag of the object itself, but the value returned for each annotation.
GetObjectAnnotation
resp = s3.get_object_annotation(
Bucket=BUCKET,
Key=KEY,
AnnotationName='test-metadata'
)
body = resp['AnnotationPayload'].read().decode()
# body
'{"project": "annotation-test", "owner": "demo-user", "created": "2026-06-17"}'
# resp (excluding AnnotationPayload)
{
'ETag': '"1fa459dad748f9fcc3be1e3dcc50ea82"',
'ContentLength': 78
}
AnnotationPayload is of type StreamingBody, and the body is retrieved with .read().
DeleteObjectAnnotation
resp = s3.delete_object_annotation(
Bucket=BUCKET,
Key=KEY,
AnnotationName='ai-summary'
)
{} # HTTPStatusCode: 204
After deletion, retrieve the List again to confirm.
resp = s3.list_object_annotations(Bucket=BUCKET, Key=KEY)
{
'AnnotationCount': 1,
'Annotations': [
{
'AnnotationName': 'test-metadata',
'Size': 78,
'ETag': '"1fa459dad748f9fcc3be1e3dcc50ea82"',
'LastModified': datetime(2026, 6, 17, 1, 37, 36, tzinfo=tzutc()),
'ChecksumAlgorithm': ['CRC32']
}
]
}
We confirmed that ai-summary has been removed and only test-metadata remains.
Operations with AWS CLI (v2.35.6)
Annotation operation commands were added in AWS CLI v2.35.6 (released 2026-06-17). Key differences from boto3 are also introduced here.
PutObjectAnnotation
--annotation-payload is a streaming blob, and the file path is specified directly. The file:// and fileb:// prefixes cannot be used.
echo -n '{"source":"cli","version":"2.35.6"}' > /tmp/payload.txt
aws s3api put-object-annotation \
--bucket my-annotation-demo-bucket \
--key videos/sample.mp4 \
--annotation-name "cli-test" \
--annotation-payload /tmp/payload.txt \
--region ap-northeast-1
{
"ETag": "\"39ce0435575e8e057d4a919c727ffe0a\"",
"ChecksumCRC64NVME": "SvqIamuCqI0=",
"ChecksumType": "FULL_OBJECT",
"ServerSideEncryption": "AES256",
"Key": "videos/sample.mp4",
"AnnotationName": "cli-test"
}
GetObjectAnnotation
The output destination for the payload is specified as a positional argument (same pattern as s3api get-object).
aws s3api get-object-annotation \
--bucket my-annotation-demo-bucket \
--key videos/sample.mp4 \
--annotation-name "cli-test" \
--region ap-northeast-1 \
/tmp/output.txt
cat /tmp/output.txt
# {"source":"cli","version":"2.35.6"}
ListObjectAnnotations / DeleteObjectAnnotation
# List
aws s3api list-object-annotations \
--bucket my-annotation-demo-bucket \
--key videos/sample.mp4 \
--region ap-northeast-1
# Delete
aws s3api delete-object-annotation \
--bucket my-annotation-demo-bucket \
--key videos/sample.mp4 \
--annotation-name "cli-test" \
--region ap-northeast-1
Differences from boto3
| Item | boto3 | AWS CLI (v2.35.6) |
|---|---|---|
| Payload specification | AnnotationPayload=bytes |
--annotation-payload <filepath> (file:// not allowed) |
| Payload retrieval | StreamingBody.read() |
Output destination file specified as positional argument |
| Checksum | CRC32 in this verification | CRC64NVME in this verification |
| Annotation on copy | — | Copied with s3 cp/mv/sync using --copy-props all |
Copying with --copy-props all
The --copy-props all option added in v2.35.6 allows you to copy annotations, metadata, and tags together when copying between S3 locations.
aws s3 cp s3://my-annotation-demo-bucket/videos/sample.mp4 \
s3://my-annotation-demo-bucket/videos/sample-copy.mp4 \
--copy-props all \
--region ap-northeast-1
Cross-search with Annotation Table (Athena)
In addition to retrieving annotations for individual objects via API, you can perform cross-search of annotations across an entire bucket using SQL. We verified the procedure for enabling the S3 Metadata Annotation Table and accessing it from Athena.
Relationship with S3 Metadata
The Annotation Table is an extension of the "S3 Metadata" infrastructure announced at re:Invent 2024. S3 Metadata previously provided a Journal Table that records object creation and deletion events, and an Inventory Table that is a snapshot of the object list. With the S3 Annotations release, an Annotation Table has been added for cross-searching annotation payloads.
All of these are configured using the same MetadataConfiguration and are commonly stored on S3 Tables (Apache Iceberg).
Comparison with Conventional Architecture
Previously, when you wanted to perform cross-search on metadata associated with S3 objects, it was common to store it in an external database (such as DynamoDB). DevelopersIO has also introduced architectures like the following.
Comparing these architectures with Annotations yields the following.
| Perspective | Conventional architecture (S3 + Lambda + DynamoDB) | S3 Annotations |
|---|---|---|
| Metadata storage | DynamoDB table | Attached to the S3 object itself |
| Sync mechanism | EventBridge → Lambda / Step Functions | No sync process to external DB required (reflection to Annotation Table is asynchronous) |
| Additional components | Lambda, DynamoDB, EventBridge, etc. | No custom sync process or external DB needed |
| Cross-search | DynamoDB Query / Scan / GSI | Annotation Table + Athena |
| Latency | DynamoDB: millisecond-level | Athena: second-level (suitable for batch) |
| Cost structure | Lambda execution + DynamoDB RCU/WCU | S3 API requests + Athena scan |
Annotations allow you to greatly simplify the architecture needed for metadata management. On the other hand, if millisecond-level low-latency access is required, the conventional architecture is more appropriate.
In the following verification, annotations have been added to multiple objects to demonstrate the usefulness of cross-search. The targets are videos/sample.mp4, videos/another.mp4, and docs/report.pdf.
Creating an IAM Role
Create a service role used by S3 Metadata when reflecting annotation information to the Annotation Table.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "metadata.s3.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Permission policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObjectAnnotation",
"s3:GetObjectVersionAnnotation",
"s3:ListBucket",
"s3:ListBucketVersions"
],
"Resource": [
"arn:aws:s3:::my-annotation-demo-bucket",
"arn:aws:s3:::my-annotation-demo-bucket/*"
]
}
]
}
Enabling Metadata Configuration
s3.create_bucket_metadata_configuration(
Bucket=BUCKET,
MetadataConfiguration={
'JournalTableConfiguration': {
'RecordExpiration': {'Expiration': 'DISABLED'}
},
'InventoryTableConfiguration': {
'ConfigurationState': 'DISABLED'
},
'AnnotationTableConfiguration': {
'ConfigurationState': 'ENABLED',
'Role': 'arn:aws:iam::123456789012:role/S3MetadataAnnotationRole'
}
}
)
At the time of verification in this article, we executed this using boto3, but confirmed that it can also be executed with AWS CLI v2.35.6.
Backfill and Confirming ACTIVE Status
In this verification, the TableStatus immediately after creation was BACKFILLING. A process runs to reflect existing annotations into the table.
resp = s3.get_bucket_metadata_configuration(Bucket=BUCKET)
config = resp['GetBucketMetadataConfigurationResult']['MetadataConfigurationResult']
print(config['AnnotationTableConfigurationResult']['TableStatus'])
BACKFILLING
In this verification, with a small-scale environment of 3 objects and 3 annotations, the status became ACTIVE approximately 25 minutes after creating the Metadata Configuration.
Creating a Federated Catalog in Glue Data Catalog
To access the Annotation Table from Athena, create a federated catalog for S3 Tables in the Glue Data Catalog.
import boto3
glue = boto3.client('glue', region_name='ap-northeast-1')
glue.create_catalog(
Name='s3tablescatalog',
CatalogInput={
'FederatedCatalog': {
'Identifier': 'arn:aws:s3tables:ap-northeast-1:123456789012:bucket/*',
'ConnectionName': 'aws:s3tables'
},
'CreateDatabaseDefaultPermissions': [
{
'Principal': {'DataLakePrincipalIdentifier': 'IAM_ALLOWED_PRINCIPALS'},
'Permissions': ['ALL']
}
],
'CreateTableDefaultPermissions': [
{
'Principal': {'DataLakePrincipalIdentifier': 'IAM_ALLOWED_PRINCIPALS'},
'Permissions': ['ALL']
}
]
}
)
Annotation Table Schema
When checking the Annotation Table that became ACTIVE in Athena, the column structure was as follows.
| Column | Description |
|---|---|
| bucket | Bucket name |
| object_key | Object key |
| object_version_id | Version ID (NULL in non-versioning environments) |
| name | Annotation name |
| last_modified_date | Annotation last modified datetime |
| size | Annotation size (bytes) |
| e_tag | Annotation ETag |
| checksum_algorithm | Checksum algorithm |
| text_value | Annotation payload (text) |
The JSON/text format annotations created in this verification were stored as strings in the text_value column. For annotations saved as JSON strings, Athena's json_extract_scalar can be used to extract internal fields.
Athena Queries
The table path is "s3tablescatalog/aws-s3"."b_<bucket-name>"."annotation".
Retrieve All Records
SELECT object_key, name, text_value
FROM "s3tablescatalog/aws-s3"."b_my-annotation-demo-bucket"."annotation"
LIMIT 10;
| object_key | name | text_value |
|---|---|---|
| videos/sample.mp4 | mediainfo | {"codec":"H.265","resolution":"3840x2160","audio_tracks":12} |
| videos/another.mp4 | mediainfo | {"codec":"H.264","resolution":"1920x1080","audio_tracks":2} |
| docs/report.pdf | classification | {"category":"finance","sensitivity":"internal"} |
We confirmed that all 3 annotations are stored.
Filter by JSON Field
You can use json_extract_scalar to specify conditions on JSON-format annotations. Filter by annotation name first, then extract the JSON.
SELECT object_key, name, text_value
FROM "s3tablescatalog/aws-s3"."b_my-annotation-demo-bucket"."annotation"
WHERE name = 'mediainfo'
AND CAST(json_extract_scalar(text_value, '$.audio_tracks') AS INTEGER) > 8;
| object_key | name | text_value |
|---|---|---|
| videos/sample.mp4 | mediainfo | {"codec":"H.265","resolution":"3840x2160","audio_tracks":12} |
With the condition audio_tracks > 8, exactly 1 record was correctly returned. We confirmed that it is possible to perform cross-search on the contents of annotations attached to S3 objects using SQL.
Summary
We operated S3 Annotations using boto3 and the AWS CLI, and confirmed Put / Get / List / Delete of Annotations, as well as cross-search using Annotation Table + Athena. With boto3 1.43.31 / botocore 1.43.31 and AWS CLI v2.35.6 used in this verification, the Annotation CRUD operations covered in this article could be executed.
S3 Annotations is a mechanism that allows you to attach more information to S3 objects than the conventional user-defined metadata or object tags. In this verification, we confirmed that JSON strings and text can be saved as annotations and retrieved via individual APIs and the CLI.
Additionally, by enabling the Annotation Table, we were able to query the saved annotations via SQL from Athena. For annotations saved as JSON strings, it was also possible to search using internal fields as conditions with json_extract_scalar.
Previously, when cross-searching metadata associated with S3 objects was needed, it was sometimes necessary to manage it separately by combining Lambda, DynamoDB, and other services. By using S3 Annotations and the Annotation Table, depending on the use case, it is possible to configure everything from attaching metadata to cross-search with Athena, without needing to prepare a custom sync infrastructure or external DB.
On the other hand, reflection to the Annotation Table is asynchronous. Even in this small-scale verification environment, it took approximately 25 minutes from creating the Metadata Configuration to becoming ACTIVE. When millisecond-level low-latency access or searches using GSI or similar features are required, conventional architectures such as DynamoDB may still be more appropriate.
In AWS CLI v2.35.6, annotation operation commands were added, and Put / Get / List / Delete could be executed from the CLI as well. With annotations now usable not only from SDKs but also from the CLI, verification and scripting have become easier.


