I built a serverless, convenient semantic search platform with S3 Vectors and Bedrock

Explaining how to implement serverless semantic search combining Amazon S3 Vectors and Amazon Bedrock. We'll introduce methods for building a low-cost, maintenance-free search infrastructure with concrete examples utilizing data from 60,000 articles.

suzuki.ryo

2026.02.16

This page has been translated by machine translation. View original

Amazon S3 Vectors is a service specialized for storing vector data and performing similarity searches. When combined with Amazon Bedrock's Embedding model, it allows you to build serverless semantic text searches (meaning-based similarity searches).

This time, I verified how easily a search infrastructure could be built with this configuration using approximately 60,000 DevelopersIO article data. I'll introduce the specific procedures.

Vectorization — Batch vectorization of article summaries using Bedrock Batch Inference
S3 Vectors Registration — Register vectors + metadata (title, author) to the index
Search — Vectorize search terms with AWS CLI → Similarity search with S3 Vectors → Output results

Architecture

Prerequisites

Input Data: articles.jsonl

I prepared article ID, title, author, slug, publication date, language, and summary in JSONL format with one article per line.

{"id": "abc123def456ghi789jk01", "title": "dbt platform の Advanced CI で event_time による最適化機能を試してみた", "author": "yasuhara-tomoki", "slug": "dbt-platform-advanced-ci-event-time", "published_at": 1736920200, "language": "ja", "summary": "dbt platform Advanced CI機能は、Enterprise以上のプランで利用可能なCI最適化機能です。..."}
{"id": "xyz987wvu654tsr321qp02", "title": "CloudWatch Application Signals で複数の異なるルックバックウィンドウを持つバーンレートアラームを組み合わせつつ CDK で実装してみる", "author": "masukawa-kentaro", "slug": "cloudwatch-application-signals-burn-rate-alarm-cdk", "published_at": 1733797500, "language": "ja", "summary": "CloudWatch Application Signals でバーンレートアラームを複数のルックバックウィンドウで設..."}

Field	Purpose
`id`	S3 Vectors key
`title`	Vectorization input + metadata (non-filterable)
`author`	Metadata (filterable)
`slug`	Metadata (filterable, for article URL construction)
`published_at`	Metadata (filterable, epoch seconds for range queries)
`language`	Metadata (filterable)
`summary`	Vectorization input (`title + summary` combined for vectorization)

※ published_at is stored as epoch seconds (integer) because S3 Vectors filter comparison operators ($gte / $lt, etc.) only support numeric types.

Model Selection: Amazon Nova Embed v1

I chose amazon.nova-2-multimodal-embeddings-v1:0 for this project. This new model was announced in November 2025 and is currently only available in us-east-1, requiring cross-region usage, but I prioritized performance. At $0.00002 per 1,000 tokens (with an additional 50% discount for Batch Inference), it's extremely cost-effective and suitable for bulk vectorization of tens of thousands of items. Being part of the Bedrock managed service ecosystem means no need to host or manage external models.

AWS Resources

Resource	Value
Embedding model	`amazon.nova-2-multimodal-embeddings-v1:0`
Vector dimensions	1024
S3 Vectors bucket	`my-vectors-bucket`
S3 Vectors index	`my-blog-index`
Batch S3	`my-batch-bucket`
IAM role	`BedrockBatchInferenceRole`

Step 1: Vectorization

For quick accuracy verification, I proceeded with manual Python scripts.

1-1. Generate JSONL for Batch Inference

I converted articles.jsonl to the input format for Bedrock Batch Inference.

import json

with open('articles.jsonl') as f_in, open('embed_input.jsonl', 'w') as f_out:
    for line in f_in:
        article = json.loads(line)
        text = f"{article['title']}\n{article['summary']}"
        record = {
            'recordId': article['id'],
            'modelInput': {
                'schemaVersion': 'nova-multimodal-embed-v1',
                'taskType': 'SINGLE_EMBEDDING',
                'singleEmbeddingParams': {
                    'embeddingPurpose': 'GENERIC_INDEX',
                    'embeddingDimension': 1024,
                    'text': {'truncationMode': 'END', 'value': text}
                }
            }
        }
        f_out.write(json.dumps(record, ensure_ascii=False) + '\n')

1-2. Upload to S3

aws s3 cp embed_input.jsonl s3://my-batch-bucket/input-rag/embed_input.jsonl

1-3. Submit Batch Inference Job

import boto3

bedrock = boto3.client('bedrock', region_name='us-east-1')

resp = bedrock.create_model_invocation_job(
    modelId='amazon.nova-2-multimodal-embeddings-v1:0',
    jobName='rag-vectorize-demo',
    roleArn='arn:aws:iam::<ACCOUNT_ID>:role/BedrockBatchInferenceRole',
    inputDataConfig={
        's3InputDataConfig': {
            's3Uri': 's3://my-batch-bucket/input-rag/embed_input.jsonl'
        }
    },
    outputDataConfig={
        's3OutputDataConfig': {
            's3Uri': 's3://my-batch-bucket/output-rag/'
        }
    }
)
job_arn = resp['jobArn']
print(f'Job ARN: {job_arn}')

1-4. Wait for Job Completion

import time

while True:
    status = bedrock.get_model_invocation_job(jobIdentifier=job_arn)['status']
    print(f'status: {status}')
    if status in ('Completed', 'Failed', 'Stopped'):
        break
    time.sleep(60)

It took about 15 minutes for 1,000 items, processing 116,945 tokens. The basic procedure for Batch Inference follows this article:

Step 2: S3 Vectors Registration

2-1. Create Index (First Time Only)

I specified title as non-filterable (display only) and the other 4 items as filterable in the metadataConfiguration.

s3vectors = boto3.client('s3vectors', region_name='<YOUR_REGION>')

s3vectors.create_index(
    vectorBucketName='my-vectors-bucket',
    indexName='my-blog-index',
    dataType='float32',
    dimension=1024,
    distanceMetric='cosine',
    metadataConfiguration={
        'nonFilterableMetadataKeys': ['title']
    }
)

2-2. Combine Vectors + Metadata → Register

I combined the batch output vectors with metadata (title, author, slug, publication date, language) from articles.jsonl for registration.

import json
import boto3

s3 = boto3.client('s3', region_name='us-east-1')
s3vectors = boto3.client('s3vectors', region_name='<YOUR_REGION>')

# Load metadata from articles.jsonl
article_meta = {}
with open('articles.jsonl') as f:
    for line in f:
        a = json.loads(line)
        article_meta[a['id']] = {
            'title': a['title'],
            'author': a['author'],
            'slug': a['slug'],
            'published_at': a['published_at'],
            'language': a['language']
        }

# Read batch output, combine with metadata and register
resp = s3.get_object(
    Bucket='my-batch-bucket',
    Key='output-rag/<JOB_ID>/embed_input.jsonl.out'
)

vectors = []
total = 0
for line in resp['Body'].iter_lines():
    result = json.loads(line)
    article_id = result['recordId']
    meta = article_meta.get(article_id, {})

    vectors.append({
        'key': article_id,
        'data': {'float32': result['modelOutput']['embeddings'][0]['embedding']},
        'metadata': {
            'title': meta.get('title', ''),
            'author': meta.get('author', ''),
            'slug': meta.get('slug', ''),
            'published_at': meta.get('published_at', 0),  # store as epoch seconds (int)
            'language': meta.get('language', '')
        }
    })

    if len(vectors) >= 100:
        s3vectors.put_vectors(
            vectorBucketName='my-vectors-bucket',
            indexName='my-blog-index',
            vectors=vectors
        )
        total += len(vectors)
        vectors = []

if vectors:
    s3vectors.put_vectors(
        vectorBucketName='my-vectors-bucket',
        indexName='my-blog-index',
        vectors=vectors
    )
    total += len(vectors)

print(f'Registration complete: {total} items')

Key points:

I stored title, author, slug, published_at, and language in metadata. This allows retrieval of article metadata without querying a database during search
put_vectors operates as upsert. Registering with the same key overwrites the vector and metadata
put_vectors has a maximum of 100 items per call, so I registered in batches of 100

S3 Vectors Metadata Constraints

S3 Vectors metadata has these constraints:

Item	Limit
Total metadata size	40 KB / vector (filterable + non-filterable)
Filterable metadata	2 KB / vector
Metadata key count	50 keys / vector
Non-filterable key count	10 keys / index
Supported types	string, number, boolean, list

Filterable metadata has a 2KB limit. If it's for display purposes only, you can register it as non-filterable metadata and use up to 40KB.
I set title as non-filterable metadata.

Metadata Filtering

S3 Vectors supports filtering by metadata using the filter parameter in query_vectors. You pass a dict directly to filter.

# Search only for articles by a specific author
result = s3vectors.query_vectors(
    vectorBucketName=BUCKET_NAME,
    indexName=INDEX_NAME,
    queryVector={'float32': query_vector},
    topK=10,
    returnDistance=True,
    returnMetadata=True,
    filter={"author": {"$eq": "wakatsuki-ryuta"}}
)

For date range filtering, comparison operators like $gte / $lt only support Number types. By storing published_at as epoch seconds (integer), we enabled period specification.

from datetime import datetime, timezone

# Only articles from 2026 onwards
epoch_2026 = int(datetime(2026, 1, 1, tzinfo=timezone.utc).timestamp())
result = s3vectors.query_vectors(
    vectorBucketName=BUCKET_NAME,
    indexName=INDEX_NAME,
    queryVector={'float32': query_vector},
    topK=10,
    returnDistance=True,
    returnMetadata=True,
    filter={"published_at": {"$gte": epoch_2026}}
)

# Only articles from 2025 (range specification)
epoch_2025 = int(datetime(2025, 1, 1, tzinfo=timezone.utc).timestamp())
result = s3vectors.query_vectors(
    ...,
    filter={"published_at": {"$gte": epoch_2025, "$lt": epoch_2026}}
)

Main operators:

Operator	Supported Types	Description
`$eq`	string, number, boolean	Exact match
`$ne`	string, number, boolean	Not equal
`$gt` / `$gte`	number	Greater than / Greater than or equal
`$lt` / `$lte`	number	Less than / Less than or equal
`$in` / `$nin`	array	Match any / Match none
`$exists`	boolean	Check field existence
`$and` / `$or`	array of filters	Logical AND / OR

For details, refer to Metadata filtering.

Step 3: Search

Search Using AWS CLI

You can search using just the AWS CLI without Python.

Vectorize Search Terms

# Create request body
cat > /tmp/embed_request.json << 'EOF'
{
  "schemaVersion": "nova-multimodal-embed-v1",
  "taskType": "SINGLE_EMBEDDING",
  "singleEmbeddingParams": {
    "embeddingPurpose": "GENERIC_INDEX",
    "embeddingDimension": 1024,
    "text": {
      "truncationMode": "END",
      "value": "Lambda コールドスタート 対策"
    }
  }
}
EOF

# Vectorize with Bedrock (pass Japanese via file)
aws bedrock-runtime invoke-model \
  --model-id amazon.nova-2-multimodal-embeddings-v1:0 \
  --region us-east-1 \
  --content-type application/json \
  --body fileb:///tmp/embed_request.json \
  /tmp/query_vector.json

Extract Query Vector and Search with S3 Vectors

# Extract vector array
python3 -c "
import json
d = json.load(open('/tmp/query_vector.json'))
print(json.dumps(d['embeddings'][0]['embedding']))
" > /tmp/query_vec_array.json

# Similar search with S3 Vectors
aws s3vectors query-vectors \
  --vector-bucket-name my-vectors-bucket \
  --index-name my-blog-index \
  --query-vector "float32=$(cat /tmp/query_vec_array.json)" \
  --top-k 5 \
  --return-distance \
  --return-metadata \
  --region <YOUR_REGION>

Execution Result

{
    "vectors": [
        {
            "distance": 0.1290004849433899,
            "key": "article-001",
            "metadata": {
                "title": "Lambdaのコールドスタート問題と対策について整理する",
                "published_at": 1732491853,
                "language": "ja",
                "author": "manabe-kenji",
                "slug": "lambda-coldstart-measures"
            }
        },
        {
            "distance": 0.1369316577911377,
            "key": "article-002",
            "metadata": {
                "language": "ja",
                "slug": "lambda-cold-start-avoid-hack",
                "published_at": 1559084450,
                "title": "VPC Lambdaのコールドスタートにお悩みの方へ捧ぐコールドスタート予防のハック Lambdaを定期実行するならメモリの割り当ては1600Mがオススメ？！",
                "author": "iwata-tomoya"
            }
        },
        {
            "distance": 0.14367318153381348,
            "key": "article-003",
            "metadata": {
                "title": "LambdaのProvisioned Concurrencyを使って、コールドスタート対策をしてみた #reinvent",
                "language": "ja",
                "published_at": 1576128882,
                "author": "sato-naoya",
                "slug": "lambda-provisioned-concurrency-coldstart"
            }
        },
        {
            "distance": 0.15859538316726685,
            "key": "article-004",
            "metadata": {
                "language": "ja",
                "title": "[速報]コールドスタート対策のLambda定期実行とサヨナラ！！ LambdaにProvisioned Concurrencyの設定が追加されました　 #reinvent",
                "author": "iwata-tomoya",
                "slug": "lambda-support-provisioned-concurrency",
                "published_at": 1575418456
            }
        },
        {
            "distance": 0.16454929113388062,
            "key": "article-005",
            "metadata": {
                "language": "ja",
                "published_at": 1669858986,
                "slug": "session-lamba-snapstart",
                "author": "hamada-koji",
                "title": "Lambdaのコールドスタートを解決するLambda SnapStartのセッションに参加してきた(SVS320)　#reinvent"
            }
        }
    ],
    "distanceMetric": "cosine"
}

By specifying --return-distance and --return-metadata, I received distance scores and metadata (title, author, slug, publication date, language).

With cosine distance, lower values indicate higher similarity. Articles related to "Lambda cold start" appeared at the top of the results.

Scaling to Full Volume

After confirming operation with 1,000 pilot items, I processed the full volume (about 57,000 items) using the same procedure. Since Batch Inference has a limit of 50,000 items per job, I split the input for submission.

Item	1,000 Items Pilot	57,000 Items Full Volume
Batch Inference	~15 minutes	~30 minutes (split submission)
S3 Vectors Registration	~10 seconds	~10 minutes
Embedding Cost	~$0.002	~$0.12

Supplement: Supporting Other Embedding Models

For Titan Text Embeddings v2

The modelInput format differs from Nova. Titan uses a simpler format with inputText / dimensions.

record = {
   'recordId': article['id'],
   'modelInput': {
       'inputText': f"{article['title']}\n{article['summary']}",
       'dimensions': 1024,
       'normalize': True
   }
}

The vector retrieval path also differs.

Model	Vector Retrieval Path
Nova Embed v1	`modelOutput['embeddings'][0]['embedding']`
Titan Embeddings v2	`modelOutput['embedding']`

For Cohere Embed v4

Cohere Embed v4 doesn't support Batch Inference, but the real-time API (invoke_model) supports batch processing of up to 96 items at once, making it practical for processing tens of thousands of items.

import boto3, json

bedrock = boto3.client('bedrock-runtime', region_name='us-west-2')

resp = bedrock.invoke_model(
   modelId='us.cohere.embed-v4:0',
   body=json.dumps({
       "texts": ["Text 1", "Text 2", ...],  # up to 96 items
       "input_type": "search_document",
       "embedding_types": ["float"],
       "output_dimension": 1024
   })
)
vectors = json.loads(resp['body'].read())['embeddings']['float']

Notes:

Inference profile required: Directly specifying model ID cohere.embed-v4:0 results in ValidationException. Use us.cohere.embed-v4:0 (cross-region inference profile)
Change input_type for search queries: Use "search_document" for registration, "search_query" for search
Throttling mitigation: Implement exponential backoff retry for bulk processing

Item	Nova Embed v1	Titan Embeddings v2	Cohere Embed v4
Batch Inference	Supported	Supported	Not supported
Real-time Batch	1 at a time	1 at a time	Up to 96/call
Model ID	`amazon.nova-2-multimodal-embeddings-v1:0`	`amazon.titan-embed-text-v2:0`	`us.cohere.embed-v4:0`
Vector retrieval path	`embeddings[0]['embedding']`	`embedding`	`embeddings['float']`
Registration/Search distinction	None	None	Specify with `input_type`

Note Job Name Constraints

The jobName for create_model_invocation_job is restricted to the regular expression [a-zA-Z0-9](-*[a-zA-Z0-9+\-.])*. Underscores (_) cannot be used.

# Error
bedrock.create_model_invocation_job(jobName='titan-embed-summary_en', ...)

# OK
bedrock.create_model_invocation_job(jobName='titan-embed-summary-en', ...)

If your summary type names contain underscores, convert them to hyphens before using them in job names.

Conclusion

I built a semantic search by bulk vectorizing article summaries with Bedrock Batch Inference and registering them with metadata in S3 Vectors. Including title, author, slug, publication date, and language in the metadata allows retrieval of article information without querying a database during search, and enables filtering.

With Amazon S3 Vectors × Bedrock (amazon.nova-2-multimodal-embeddings-v1:0), you can easily build a search infrastructure that's serverless and fully managed.

However, the number of items retrievable per query is limited to 100, so different approaches might be needed for complex weighting or hybrid search on large-scale data. While OpenSearch is suitable for full-text search and complex ranking adjustments, try S3 Vectors if you want to experiment with low-cost, management-free semantic search first.