Building a Serverless Lightweight Semantic Search Foundation with S3 Vectors and Bedrock

Building a Serverless Lightweight Semantic Search Foundation with S3 Vectors and Bedrock

How to implement serverless semantic search by combining Amazon S3 Vectors and Amazon Bedrock. We'll introduce methods for building a low-cost, maintenance-free search infrastructure using concrete examples with 60,000 article records.
2026.02.16

This page has been translated by machine translation. View original

Amazon S3 Vectors is a service specialized for storing vector data and similarity search. When combined with Amazon Bedrock's Embedding models, you can build serverless semantic text search (meaning-based similarity search).

In this article, I verified how easily a search infrastructure could be built with this configuration using approximately 60,000 DevelopersIO articles as subject matter. Here are the specific steps:

  1. Vectorization — Bulk vectorize article summaries with Bedrock Batch Inference
  2. S3 Vectors Registration — Register vectors + metadata (title, author) to the index
  3. Search — Vectorize search terms with AWS CLI → Similarity search with S3 Vectors → Output results

Architecture

Prerequisites

Input Data: articles.jsonl

I prepared article ID, title, author, slug, publication date, language, and summary in JSONL format with one article per line.

{"id": "abc123def456ghi789jk01", "title": "dbt platform の Advanced CI で event_time による最適化機能を試してみた", "author": "yasuhara-tomoki", "slug": "dbt-platform-advanced-ci-event-time", "published_at": 1736920200, "language": "ja", "summary": "dbt platform Advanced CI機能は、Enterprise以上のプランで利用可能なCI最適化機能です。..."}
{"id": "xyz987wvu654tsr321qp02", "title": "CloudWatch Application Signals で複数の異なるルックバックウィンドウを持つバーンレートアラームを組み合わせつつ CDK で実装してみる", "author": "masukawa-kentaro", "slug": "cloudwatch-application-signals-burn-rate-alarm-cdk", "published_at": 1733797500, "language": "ja", "summary": "CloudWatch Application Signals でバーンレートアラームを複数のルックバックウィンドウで設..."}
Field Usage
id S3 Vectors key
title Vectorization input + metadata (non-filterable)
author Metadata (filterable)
slug Metadata (filterable, for article URL construction)
published_at Metadata (filterable, epoch seconds for range queries)
language Metadata (filterable)
summary Vectorization input (title + summary combined for vectorization)

published_at is stored as epoch seconds (integer) because S3 Vectors filter comparison operators ($gte / $lt, etc.) only support numeric types.

Model Selection: Amazon Nova Embed v1

I chose amazon.nova-2-multimodal-embeddings-v1:0 for this project. This new model was announced in November 2025 and is currently only available in us-east-1, requiring cross-region usage, but I prioritized performance. At $0.00002 per 1,000 tokens (with Batch Inference offering an additional 50% discount), it's very cost-effective and suitable for bulk vectorization of tens of thousands of records. A key advantage is that it operates entirely within Bedrock's managed service, eliminating the need to host or manage external models.

AWS Resources

Resource Value
Embedding model amazon.nova-2-multimodal-embeddings-v1:0
Vector dimensions 1024
S3 Vectors bucket my-vectors-bucket
S3 Vectors index my-blog-index
Batch S3 my-batch-bucket
IAM role BedrockBatchInferenceRole

Step 1: Vectorization

For quick accuracy verification, I proceeded with manual Python scripts.

1-1. Generate JSONL for Batch Inference

I converted articles.jsonl to the input format for Bedrock Batch Inference.

import json

with open('articles.jsonl') as f_in, open('embed_input.jsonl', 'w') as f_out:
    for line in f_in:
        article = json.loads(line)
        text = f"{article['title']}\n{article['summary']}"
        record = {
            'recordId': article['id'],
            'modelInput': {
                'schemaVersion': 'nova-multimodal-embed-v1',
                'taskType': 'SINGLE_EMBEDDING',
                'singleEmbeddingParams': {
                    'embeddingPurpose': 'GENERIC_INDEX',
                    'embeddingDimension': 1024,
                    'text': {'truncationMode': 'END', 'value': text}
                }
            }
        }
        f_out.write(json.dumps(record, ensure_ascii=False) + '\n')

1-2. Upload to S3

aws s3 cp embed_input.jsonl s3://my-batch-bucket/input-rag/embed_input.jsonl

1-3. Submit Batch Inference Job

import boto3

bedrock = boto3.client('bedrock', region_name='us-east-1')

resp = bedrock.create_model_invocation_job(
    modelId='amazon.nova-2-multimodal-embeddings-v1:0',
    jobName='rag-vectorize-demo',
    roleArn='arn:aws:iam::<ACCOUNT_ID>:role/BedrockBatchInferenceRole',
    inputDataConfig={
        's3InputDataConfig': {
            's3Uri': 's3://my-batch-bucket/input-rag/embed_input.jsonl'
        }
    },
    outputDataConfig={
        's3OutputDataConfig': {
            's3Uri': 's3://my-batch-bucket/output-rag/'
        }
    }
)
job_arn = resp['jobArn']
print(f'Job ARN: {job_arn}')

1-4. Wait for Job Completion

import time

while True:
    status = bedrock.get_model_invocation_job(jobIdentifier=job_arn)['status']
    print(f'status: {status}')
    if status in ('Completed', 'Failed', 'Stopped'):
        break
    time.sleep(60)

It took about 15 minutes for 1,000 records with 116,945 input tokens. The basic procedure for Batch Inference follows this article:

https://dev.classmethod.jp/articles/amazon-bedrock-batch-inference-structured-outputs/

Step 2: S3 Vectors Registration

2-1. Create Index (First Time Only)

In metadataConfiguration, I specified title as non-filterable (display only) and made the other 4 items filterable.

s3vectors = boto3.client('s3vectors', region_name='<YOUR_REGION>')

s3vectors.create_index(
    vectorBucketName='my-vectors-bucket',
    indexName='my-blog-index',
    dataType='float32',
    dimension=1024,
    distanceMetric='cosine',
    metadataConfiguration={
        'nonFilterableMetadataKeys': ['title']
    }
)

2-2. Combine Vector + Metadata → Register

I combined the batch output vectors with metadata (title, author, slug, publication date, language) from articles.jsonl for registration.

import json
import boto3

s3 = boto3.client('s3', region_name='us-east-1')
s3vectors = boto3.client('s3vectors', region_name='<YOUR_REGION>')

# Load metadata from articles.jsonl
article_meta = {}
with open('articles.jsonl') as f:
    for line in f:
        a = json.loads(line)
        article_meta[a['id']] = {
            'title': a['title'],
            'author': a['author'],
            'slug': a['slug'],
            'published_at': a['published_at'],
            'language': a['language']
        }

# Read batch output, combine with metadata and register
resp = s3.get_object(
    Bucket='my-batch-bucket',
    Key='output-rag/<JOB_ID>/embed_input.jsonl.out'
)

vectors = []
total = 0
for line in resp['Body'].iter_lines():
    result = json.loads(line)
    article_id = result['recordId']
    meta = article_meta.get(article_id, {})

    vectors.append({
        'key': article_id,
        'data': {'float32': result['modelOutput']['embeddings'][0]['embedding']},
        'metadata': {
            'title': meta.get('title', ''),
            'author': meta.get('author', ''),
            'slug': meta.get('slug', ''),
            'published_at': meta.get('published_at', 0),  # stored as epoch seconds (int)
            'language': meta.get('language', '')
        }
    })

    if len(vectors) >= 100:
        s3vectors.put_vectors(
            vectorBucketName='my-vectors-bucket',
            indexName='my-blog-index',
            vectors=vectors
        )
        total += len(vectors)
        vectors = []

if vectors:
    s3vectors.put_vectors(
        vectorBucketName='my-vectors-bucket',
        indexName='my-blog-index',
        vectors=vectors
    )
    total += len(vectors)

print(f'Registration complete: {total} records')

Key points:

  • I stored title, author, slug, published_at, and language in metadata. This allows retrieving article metadata without querying a database during search
  • put_vectors operates as an upsert. Registering with the same key overwrites the vector and metadata
  • put_vectors has a maximum of 100 items per call, so I registered in batches of 100

S3 Vectors Metadata Constraints

S3 Vectors has the following metadata constraints:

Item Limitation
Total metadata size 40 KB / vector (filterable + non-filterable)
Filterable metadata 2 KB / vector
Metadata keys 50 keys / vector
Non-filterable keys 10 keys / index
Supported types string, number, boolean, list

Filterable metadata has a 2KB limit. For display-only purposes, non-filterable metadata can be used with a 40KB limit, but due to change restrictions, I decided not to use this option.

Metadata Filtering

S3 Vectors supports filtering by metadata using the filter parameter in query_vectors. A dict is passed directly to filter.

# Search only articles by a specific author
result = s3vectors.query_vectors(
    vectorBucketName=BUCKET_NAME,
    indexName=INDEX_NAME,
    queryVector={'float32': query_vector},
    topK=10,
    returnDistance=True,
    returnMetadata=True,
    filter={"author": {"$eq": "wakatsuki-ryuta"}}
)

For date range filtering, comparison operators like $gte / $lt only work with Number types. In this case, storing published_at as epoch seconds (integer) enables period specification:

from datetime import datetime, timezone

# Only articles from 2026 onwards
epoch_2026 = int(datetime(2026, 1, 1, tzinfo=timezone.utc).timestamp())
result = s3vectors.query_vectors(
    vectorBucketName=BUCKET_NAME,
    indexName=INDEX_NAME,
    queryVector={'float32': query_vector},
    topK=10,
    returnDistance=True,
    returnMetadata=True,
    filter={"published_at": {"$gte": epoch_2026}}
)

# Only articles from 2025 (range specification)
epoch_2025 = int(datetime(2025, 1, 1, tzinfo=timezone.utc).timestamp())
result = s3vectors.query_vectors(
    ...,
    filter={"published_at": {"$gte": epoch_2025, "$lt": epoch_2026}}
)

Main operators:

Operator Supported Types Description
$eq string, number, boolean Exact match
$ne string, number, boolean Not equal
$gt / $gte number Greater than / Greater than or equal
$lt / $lte number Less than / Less than or equal
$in / $nin array Match any / Match none
$exists boolean Check field existence
$and / $or array of filters Logical AND / OR

For details, see Metadata filtering.

Step 3: Search

Searching with AWS CLI

You can search without Python, using only AWS CLI.

Vectorize Search Term

# Create request body
cat > /tmp/embed_request.json << 'EOF'
{
  "schemaVersion": "nova-multimodal-embed-v1",
  "taskType": "SINGLE_EMBEDDING",
  "singleEmbeddingParams": {
    "embeddingPurpose": "GENERIC_INDEX",
    "embeddingDimension": 1024,
    "text": {
      "truncationMode": "END",
      "value": "Lambda コールドスタート 対策"
    }
  }
}
EOF

# Vectorize with Bedrock (pass Japanese text via file)
aws bedrock-runtime invoke-model \
  --model-id amazon.nova-2-multimodal-embeddings-v1:0 \
  --region us-east-1 \
  --content-type application/json \
  --body fileb:///tmp/embed_request.json \
  /tmp/query_vector.json

Extract Query Vector and Search with S3 Vectors

# Extract vector array
python3 -c "
import json
d = json.load(open('/tmp/query_vector.json'))
print(json.dumps(d['embeddings'][0]['embedding']))
" > /tmp/query_vec_array.json

# Similarity search with S3 Vectors
aws s3vectors query-vectors \
  --vector-bucket-name my-vectors-bucket \
  --index-name my-blog-index \
  --query-vector "float32=$(cat /tmp/query_vec_array.json)" \
  --top-k 5 \
  --return-distance \
  --return-metadata \
  --region <YOUR_REGION>

Execution Result

{
    "vectors": [
        {
            "distance": 0.1290004849433899,
            "key": "article-001",
            "metadata": {
                "title": "Lambdaのコールドスタート問題と対策について整理する",
                "published_at": 1732491853,
                "language": "ja",
                "author": "manabe-kenji",
                "slug": "lambda-coldstart-measures"
            }
        },
        {
            "distance": 0.1369316577911377,
            "key": "article-002",
            "metadata": {
                "language": "ja",
                "slug": "lambda-cold-start-avoid-hack",
                "published_at": 1559084450,
                "title": "VPC Lambdaのコールドスタートにお悩みの方へ捧ぐコールドスタート予防のハック Lambdaを定期実行するならメモリの割り当ては1600Mがオススメ?!",
                "author": "iwata-tomoya"
            }
        },
        {
            "distance": 0.14367318153381348,
            "key": "article-003",
            "metadata": {
                "title": "LambdaのProvisioned Concurrencyを使って、コールドスタート対策をしてみた #reinvent",
                "language": "ja",
                "published_at": 1576128882,
                "author": "sato-naoya",
                "slug": "lambda-provisioned-concurrency-coldstart"
            }
        },
        {
            "distance": 0.15859538316726685,
            "key": "article-004",
            "metadata": {
                "language": "ja",
                "title": "[速報]コールドスタート対策のLambda定期実行とサヨナラ!! LambdaにProvisioned Concurrencyの設定が追加されました  #reinvent",
                "author": "iwata-tomoya",
                "slug": "lambda-support-provisioned-concurrency",
                "published_at": 1575418456
            }
        },
        {
            "distance": 0.16454929113388062,
            "key": "article-005",
            "metadata": {
                "language": "ja",
                "published_at": 1669858986,
                "slug": "session-lamba-snapstart",
                "author": "hamada-koji",
                "title": "Lambdaのコールドスタートを解決するLambda SnapStartのセッションに参加してきた(SVS320) #reinvent"
            }
        }
    ],
    "distanceMetric": "cosine"
}

By specifying --return-distance and --return-metadata, I received distance scores and metadata (title, author, slug, publication date, language).

With cosine distance, lower values indicate higher similarity. Articles related to "Lambda cold start" appeared in the top results.

Scaling to Full Volume

After confirming functionality with the 1,000-record pilot, I processed the full dataset (about 57,000 records) using the same procedure. Due to the 50,000-record limit per Batch Inference job, I split the input for submission.

Item 1,000-record Pilot 57,000-record Full Volume
Batch Inference ~15 min ~30 min (split submission)
S3 Vectors Registration ~10 sec ~10 min
Embedding Cost ~$0.002 ~$0.12

Summary

By bulk vectorizing article summaries with Bedrock Batch Inference and registering them in S3 Vectors with metadata, I built a semantic search system. Including title, author, slug, publication date, and language in the metadata allows retrieving article information without querying a database during search, and enables filtering capabilities.

With Amazon S3 Vectors × Bedrock (amazon.nova-2-multimodal-embeddings-v1:0), you can easily build a search infrastructure that's serverless and fully managed.

However, the maximum number of results per query is limited to 100, so different approaches may be necessary for complex weighting or hybrid search with large datasets. OpenSearch may be more appropriate when full-text search or complex ranking adjustments are required, but try S3 Vectors first if you want to test semantic search at low cost with minimal management overhead.

Share this article

FacebookHatena blogX