Building a Serverless Lightweight Semantic Search Foundation with S3 Vectors and Bedrock
This page has been translated by machine translation. View original
Amazon S3 Vectors is a service specialized for storing vector data and similarity search. When combined with Amazon Bedrock's Embedding models, you can build serverless semantic text search (meaning-based similarity search).
In this article, I verified how easily a search infrastructure could be built with this configuration using approximately 60,000 DevelopersIO articles as subject matter. Here are the specific steps:
- Vectorization — Bulk vectorize article summaries with Bedrock Batch Inference
- S3 Vectors Registration — Register vectors + metadata (title, author) to the index
- Search — Vectorize search terms with AWS CLI → Similarity search with S3 Vectors → Output results
Architecture
Prerequisites
Input Data: articles.jsonl
I prepared article ID, title, author, slug, publication date, language, and summary in JSONL format with one article per line.
{"id": "abc123def456ghi789jk01", "title": "dbt platform の Advanced CI で event_time による最適化機能を試してみた", "author": "yasuhara-tomoki", "slug": "dbt-platform-advanced-ci-event-time", "published_at": 1736920200, "language": "ja", "summary": "dbt platform Advanced CI機能は、Enterprise以上のプランで利用可能なCI最適化機能です。..."}
{"id": "xyz987wvu654tsr321qp02", "title": "CloudWatch Application Signals で複数の異なるルックバックウィンドウを持つバーンレートアラームを組み合わせつつ CDK で実装してみる", "author": "masukawa-kentaro", "slug": "cloudwatch-application-signals-burn-rate-alarm-cdk", "published_at": 1733797500, "language": "ja", "summary": "CloudWatch Application Signals でバーンレートアラームを複数のルックバックウィンドウで設..."}
| Field | Usage |
|---|---|
id |
S3 Vectors key |
title |
Vectorization input + metadata (non-filterable) |
author |
Metadata (filterable) |
slug |
Metadata (filterable, for article URL construction) |
published_at |
Metadata (filterable, epoch seconds for range queries) |
language |
Metadata (filterable) |
summary |
Vectorization input (title + summary combined for vectorization) |
※ published_at is stored as epoch seconds (integer) because S3 Vectors filter comparison operators ($gte / $lt, etc.) only support numeric types.
Model Selection: Amazon Nova Embed v1
I chose amazon.nova-2-multimodal-embeddings-v1:0 for this project. This new model was announced in November 2025 and is currently only available in us-east-1, requiring cross-region usage, but I prioritized performance. At $0.00002 per 1,000 tokens (with Batch Inference offering an additional 50% discount), it's very cost-effective and suitable for bulk vectorization of tens of thousands of records. A key advantage is that it operates entirely within Bedrock's managed service, eliminating the need to host or manage external models.
AWS Resources
| Resource | Value |
|---|---|
| Embedding model | amazon.nova-2-multimodal-embeddings-v1:0 |
| Vector dimensions | 1024 |
| S3 Vectors bucket | my-vectors-bucket |
| S3 Vectors index | my-blog-index |
| Batch S3 | my-batch-bucket |
| IAM role | BedrockBatchInferenceRole |
Step 1: Vectorization
For quick accuracy verification, I proceeded with manual Python scripts.
1-1. Generate JSONL for Batch Inference
I converted articles.jsonl to the input format for Bedrock Batch Inference.
import json
with open('articles.jsonl') as f_in, open('embed_input.jsonl', 'w') as f_out:
for line in f_in:
article = json.loads(line)
text = f"{article['title']}\n{article['summary']}"
record = {
'recordId': article['id'],
'modelInput': {
'schemaVersion': 'nova-multimodal-embed-v1',
'taskType': 'SINGLE_EMBEDDING',
'singleEmbeddingParams': {
'embeddingPurpose': 'GENERIC_INDEX',
'embeddingDimension': 1024,
'text': {'truncationMode': 'END', 'value': text}
}
}
}
f_out.write(json.dumps(record, ensure_ascii=False) + '\n')
1-2. Upload to S3
aws s3 cp embed_input.jsonl s3://my-batch-bucket/input-rag/embed_input.jsonl
1-3. Submit Batch Inference Job
import boto3
bedrock = boto3.client('bedrock', region_name='us-east-1')
resp = bedrock.create_model_invocation_job(
modelId='amazon.nova-2-multimodal-embeddings-v1:0',
jobName='rag-vectorize-demo',
roleArn='arn:aws:iam::<ACCOUNT_ID>:role/BedrockBatchInferenceRole',
inputDataConfig={
's3InputDataConfig': {
's3Uri': 's3://my-batch-bucket/input-rag/embed_input.jsonl'
}
},
outputDataConfig={
's3OutputDataConfig': {
's3Uri': 's3://my-batch-bucket/output-rag/'
}
}
)
job_arn = resp['jobArn']
print(f'Job ARN: {job_arn}')
1-4. Wait for Job Completion
import time
while True:
status = bedrock.get_model_invocation_job(jobIdentifier=job_arn)['status']
print(f'status: {status}')
if status in ('Completed', 'Failed', 'Stopped'):
break
time.sleep(60)
It took about 15 minutes for 1,000 records with 116,945 input tokens. The basic procedure for Batch Inference follows this article:
Step 2: S3 Vectors Registration
2-1. Create Index (First Time Only)
In metadataConfiguration, I specified title as non-filterable (display only) and made the other 4 items filterable.
s3vectors = boto3.client('s3vectors', region_name='<YOUR_REGION>')
s3vectors.create_index(
vectorBucketName='my-vectors-bucket',
indexName='my-blog-index',
dataType='float32',
dimension=1024,
distanceMetric='cosine',
metadataConfiguration={
'nonFilterableMetadataKeys': ['title']
}
)
2-2. Combine Vector + Metadata → Register
I combined the batch output vectors with metadata (title, author, slug, publication date, language) from articles.jsonl for registration.
import json
import boto3
s3 = boto3.client('s3', region_name='us-east-1')
s3vectors = boto3.client('s3vectors', region_name='<YOUR_REGION>')
# Load metadata from articles.jsonl
article_meta = {}
with open('articles.jsonl') as f:
for line in f:
a = json.loads(line)
article_meta[a['id']] = {
'title': a['title'],
'author': a['author'],
'slug': a['slug'],
'published_at': a['published_at'],
'language': a['language']
}
# Read batch output, combine with metadata and register
resp = s3.get_object(
Bucket='my-batch-bucket',
Key='output-rag/<JOB_ID>/embed_input.jsonl.out'
)
vectors = []
total = 0
for line in resp['Body'].iter_lines():
result = json.loads(line)
article_id = result['recordId']
meta = article_meta.get(article_id, {})
vectors.append({
'key': article_id,
'data': {'float32': result['modelOutput']['embeddings'][0]['embedding']},
'metadata': {
'title': meta.get('title', ''),
'author': meta.get('author', ''),
'slug': meta.get('slug', ''),
'published_at': meta.get('published_at', 0), # stored as epoch seconds (int)
'language': meta.get('language', '')
}
})
if len(vectors) >= 100:
s3vectors.put_vectors(
vectorBucketName='my-vectors-bucket',
indexName='my-blog-index',
vectors=vectors
)
total += len(vectors)
vectors = []
if vectors:
s3vectors.put_vectors(
vectorBucketName='my-vectors-bucket',
indexName='my-blog-index',
vectors=vectors
)
total += len(vectors)
print(f'Registration complete: {total} records')
Key points:
- I stored
title,author,slug,published_at, andlanguageinmetadata. This allows retrieving article metadata without querying a database during search put_vectorsoperates as an upsert. Registering with the same key overwrites the vector and metadataput_vectorshas a maximum of 100 items per call, so I registered in batches of 100
S3 Vectors Metadata Constraints
S3 Vectors has the following metadata constraints:
| Item | Limitation |
|---|---|
| Total metadata size | 40 KB / vector (filterable + non-filterable) |
| Filterable metadata | 2 KB / vector |
| Metadata keys | 50 keys / vector |
| Non-filterable keys | 10 keys / index |
| Supported types | string, number, boolean, list |
Filterable metadata has a 2KB limit. For display-only purposes, non-filterable metadata can be used with a 40KB limit, but due to change restrictions, I decided not to use this option.
Metadata Filtering
S3 Vectors supports filtering by metadata using the filter parameter in query_vectors. A dict is passed directly to filter.
# Search only articles by a specific author
result = s3vectors.query_vectors(
vectorBucketName=BUCKET_NAME,
indexName=INDEX_NAME,
queryVector={'float32': query_vector},
topK=10,
returnDistance=True,
returnMetadata=True,
filter={"author": {"$eq": "wakatsuki-ryuta"}}
)
For date range filtering, comparison operators like $gte / $lt only work with Number types. In this case, storing published_at as epoch seconds (integer) enables period specification:
from datetime import datetime, timezone
# Only articles from 2026 onwards
epoch_2026 = int(datetime(2026, 1, 1, tzinfo=timezone.utc).timestamp())
result = s3vectors.query_vectors(
vectorBucketName=BUCKET_NAME,
indexName=INDEX_NAME,
queryVector={'float32': query_vector},
topK=10,
returnDistance=True,
returnMetadata=True,
filter={"published_at": {"$gte": epoch_2026}}
)
# Only articles from 2025 (range specification)
epoch_2025 = int(datetime(2025, 1, 1, tzinfo=timezone.utc).timestamp())
result = s3vectors.query_vectors(
...,
filter={"published_at": {"$gte": epoch_2025, "$lt": epoch_2026}}
)
Main operators:
| Operator | Supported Types | Description |
|---|---|---|
$eq |
string, number, boolean | Exact match |
$ne |
string, number, boolean | Not equal |
$gt / $gte |
number | Greater than / Greater than or equal |
$lt / $lte |
number | Less than / Less than or equal |
$in / $nin |
array | Match any / Match none |
$exists |
boolean | Check field existence |
$and / $or |
array of filters | Logical AND / OR |
For details, see Metadata filtering.
Step 3: Search
Searching with AWS CLI
You can search without Python, using only AWS CLI.
Vectorize Search Term
# Create request body
cat > /tmp/embed_request.json << 'EOF'
{
"schemaVersion": "nova-multimodal-embed-v1",
"taskType": "SINGLE_EMBEDDING",
"singleEmbeddingParams": {
"embeddingPurpose": "GENERIC_INDEX",
"embeddingDimension": 1024,
"text": {
"truncationMode": "END",
"value": "Lambda コールドスタート 対策"
}
}
}
EOF
# Vectorize with Bedrock (pass Japanese text via file)
aws bedrock-runtime invoke-model \
--model-id amazon.nova-2-multimodal-embeddings-v1:0 \
--region us-east-1 \
--content-type application/json \
--body fileb:///tmp/embed_request.json \
/tmp/query_vector.json
Extract Query Vector and Search with S3 Vectors
# Extract vector array
python3 -c "
import json
d = json.load(open('/tmp/query_vector.json'))
print(json.dumps(d['embeddings'][0]['embedding']))
" > /tmp/query_vec_array.json
# Similarity search with S3 Vectors
aws s3vectors query-vectors \
--vector-bucket-name my-vectors-bucket \
--index-name my-blog-index \
--query-vector "float32=$(cat /tmp/query_vec_array.json)" \
--top-k 5 \
--return-distance \
--return-metadata \
--region <YOUR_REGION>
Execution Result
{
"vectors": [
{
"distance": 0.1290004849433899,
"key": "article-001",
"metadata": {
"title": "Lambdaのコールドスタート問題と対策について整理する",
"published_at": 1732491853,
"language": "ja",
"author": "manabe-kenji",
"slug": "lambda-coldstart-measures"
}
},
{
"distance": 0.1369316577911377,
"key": "article-002",
"metadata": {
"language": "ja",
"slug": "lambda-cold-start-avoid-hack",
"published_at": 1559084450,
"title": "VPC Lambdaのコールドスタートにお悩みの方へ捧ぐコールドスタート予防のハック Lambdaを定期実行するならメモリの割り当ては1600Mがオススメ?!",
"author": "iwata-tomoya"
}
},
{
"distance": 0.14367318153381348,
"key": "article-003",
"metadata": {
"title": "LambdaのProvisioned Concurrencyを使って、コールドスタート対策をしてみた #reinvent",
"language": "ja",
"published_at": 1576128882,
"author": "sato-naoya",
"slug": "lambda-provisioned-concurrency-coldstart"
}
},
{
"distance": 0.15859538316726685,
"key": "article-004",
"metadata": {
"language": "ja",
"title": "[速報]コールドスタート対策のLambda定期実行とサヨナラ!! LambdaにProvisioned Concurrencyの設定が追加されました #reinvent",
"author": "iwata-tomoya",
"slug": "lambda-support-provisioned-concurrency",
"published_at": 1575418456
}
},
{
"distance": 0.16454929113388062,
"key": "article-005",
"metadata": {
"language": "ja",
"published_at": 1669858986,
"slug": "session-lamba-snapstart",
"author": "hamada-koji",
"title": "Lambdaのコールドスタートを解決するLambda SnapStartのセッションに参加してきた(SVS320) #reinvent"
}
}
],
"distanceMetric": "cosine"
}
By specifying --return-distance and --return-metadata, I received distance scores and metadata (title, author, slug, publication date, language).
With cosine distance, lower values indicate higher similarity. Articles related to "Lambda cold start" appeared in the top results.
Scaling to Full Volume
After confirming functionality with the 1,000-record pilot, I processed the full dataset (about 57,000 records) using the same procedure. Due to the 50,000-record limit per Batch Inference job, I split the input for submission.
| Item | 1,000-record Pilot | 57,000-record Full Volume |
|---|---|---|
| Batch Inference | ~15 min | ~30 min (split submission) |
| S3 Vectors Registration | ~10 sec | ~10 min |
| Embedding Cost | ~$0.002 | ~$0.12 |
Summary
By bulk vectorizing article summaries with Bedrock Batch Inference and registering them in S3 Vectors with metadata, I built a semantic search system. Including title, author, slug, publication date, and language in the metadata allows retrieving article information without querying a database during search, and enables filtering capabilities.
With Amazon S3 Vectors × Bedrock (amazon.nova-2-multimodal-embeddings-v1:0), you can easily build a search infrastructure that's serverless and fully managed.
However, the maximum number of results per query is limited to 100, so different approaches may be necessary for complex weighting or hybrid search with large datasets. OpenSearch may be more appropriate when full-text search or complex ranking adjustments are required, but try S3 Vectors first if you want to test semantic search at low cost with minimal management overhead.