I built a serverless, convenient semantic search platform with S3 Vectors and Bedrock
This page has been translated by machine translation. View original
Amazon S3 Vectors is a service specialized for storing vector data and performing similarity searches. When combined with Amazon Bedrock's Embedding model, it allows you to build serverless semantic text searches (meaning-based similarity searches).
This time, I verified how easily a search infrastructure could be built with this configuration using approximately 60,000 DevelopersIO article data. I'll introduce the specific procedures.
- Vectorization — Batch vectorization of article summaries using Bedrock Batch Inference
- S3 Vectors Registration — Register vectors + metadata (title, author) to the index
- Search — Vectorize search terms with AWS CLI → Similarity search with S3 Vectors → Output results
Architecture
Prerequisites
Input Data: articles.jsonl
I prepared article ID, title, author, slug, publication date, language, and summary in JSONL format with one article per line.
{"id": "abc123def456ghi789jk01", "title": "dbt platform の Advanced CI で event_time による最適化機能を試してみた", "author": "yasuhara-tomoki", "slug": "dbt-platform-advanced-ci-event-time", "published_at": 1736920200, "language": "ja", "summary": "dbt platform Advanced CI機能は、Enterprise以上のプランで利用可能なCI最適化機能です。..."}
{"id": "xyz987wvu654tsr321qp02", "title": "CloudWatch Application Signals で複数の異なるルックバックウィンドウを持つバーンレートアラームを組み合わせつつ CDK で実装してみる", "author": "masukawa-kentaro", "slug": "cloudwatch-application-signals-burn-rate-alarm-cdk", "published_at": 1733797500, "language": "ja", "summary": "CloudWatch Application Signals でバーンレートアラームを複数のルックバックウィンドウで設..."}
| Field | Purpose |
|---|---|
id |
S3 Vectors key |
title |
Vectorization input + metadata (non-filterable) |
author |
Metadata (filterable) |
slug |
Metadata (filterable, for article URL construction) |
published_at |
Metadata (filterable, epoch seconds for range queries) |
language |
Metadata (filterable) |
summary |
Vectorization input (title + summary combined for vectorization) |
※ published_at is stored as epoch seconds (integer) because S3 Vectors filter comparison operators ($gte / $lt, etc.) only support numeric types.
Model Selection: Amazon Nova Embed v1
I chose amazon.nova-2-multimodal-embeddings-v1:0 for this project. This new model was announced in November 2025 and is currently only available in us-east-1, requiring cross-region usage, but I prioritized performance. At $0.00002 per 1,000 tokens (with an additional 50% discount for Batch Inference), it's extremely cost-effective and suitable for bulk vectorization of tens of thousands of items. Being part of the Bedrock managed service ecosystem means no need to host or manage external models.
AWS Resources
| Resource | Value |
|---|---|
| Embedding model | amazon.nova-2-multimodal-embeddings-v1:0 |
| Vector dimensions | 1024 |
| S3 Vectors bucket | my-vectors-bucket |
| S3 Vectors index | my-blog-index |
| Batch S3 | my-batch-bucket |
| IAM role | BedrockBatchInferenceRole |
Step 1: Vectorization
For quick accuracy verification, I proceeded with manual Python scripts.
1-1. Generate JSONL for Batch Inference
I converted articles.jsonl to the input format for Bedrock Batch Inference.
import json
with open('articles.jsonl') as f_in, open('embed_input.jsonl', 'w') as f_out:
for line in f_in:
article = json.loads(line)
text = f"{article['title']}\n{article['summary']}"
record = {
'recordId': article['id'],
'modelInput': {
'schemaVersion': 'nova-multimodal-embed-v1',
'taskType': 'SINGLE_EMBEDDING',
'singleEmbeddingParams': {
'embeddingPurpose': 'GENERIC_INDEX',
'embeddingDimension': 1024,
'text': {'truncationMode': 'END', 'value': text}
}
}
}
f_out.write(json.dumps(record, ensure_ascii=False) + '\n')
1-2. Upload to S3
aws s3 cp embed_input.jsonl s3://my-batch-bucket/input-rag/embed_input.jsonl
1-3. Submit Batch Inference Job
import boto3
bedrock = boto3.client('bedrock', region_name='us-east-1')
resp = bedrock.create_model_invocation_job(
modelId='amazon.nova-2-multimodal-embeddings-v1:0',
jobName='rag-vectorize-demo',
roleArn='arn:aws:iam::<ACCOUNT_ID>:role/BedrockBatchInferenceRole',
inputDataConfig={
's3InputDataConfig': {
's3Uri': 's3://my-batch-bucket/input-rag/embed_input.jsonl'
}
},
outputDataConfig={
's3OutputDataConfig': {
's3Uri': 's3://my-batch-bucket/output-rag/'
}
}
)
job_arn = resp['jobArn']
print(f'Job ARN: {job_arn}')
1-4. Wait for Job Completion
import time
while True:
status = bedrock.get_model_invocation_job(jobIdentifier=job_arn)['status']
print(f'status: {status}')
if status in ('Completed', 'Failed', 'Stopped'):
break
time.sleep(60)
It took about 15 minutes for 1,000 items, processing 116,945 tokens. The basic procedure for Batch Inference follows this article:
Step 2: S3 Vectors Registration
2-1. Create Index (First Time Only)
I specified title as non-filterable (display only) and the other 4 items as filterable in the metadataConfiguration.
s3vectors = boto3.client('s3vectors', region_name='<YOUR_REGION>')
s3vectors.create_index(
vectorBucketName='my-vectors-bucket',
indexName='my-blog-index',
dataType='float32',
dimension=1024,
distanceMetric='cosine',
metadataConfiguration={
'nonFilterableMetadataKeys': ['title']
}
)
2-2. Combine Vectors + Metadata → Register
I combined the batch output vectors with metadata (title, author, slug, publication date, language) from articles.jsonl for registration.
import json
import boto3
s3 = boto3.client('s3', region_name='us-east-1')
s3vectors = boto3.client('s3vectors', region_name='<YOUR_REGION>')
# Load metadata from articles.jsonl
article_meta = {}
with open('articles.jsonl') as f:
for line in f:
a = json.loads(line)
article_meta[a['id']] = {
'title': a['title'],
'author': a['author'],
'slug': a['slug'],
'published_at': a['published_at'],
'language': a['language']
}
# Read batch output, combine with metadata and register
resp = s3.get_object(
Bucket='my-batch-bucket',
Key='output-rag/<JOB_ID>/embed_input.jsonl.out'
)
vectors = []
total = 0
for line in resp['Body'].iter_lines():
result = json.loads(line)
article_id = result['recordId']
meta = article_meta.get(article_id, {})
vectors.append({
'key': article_id,
'data': {'float32': result['modelOutput']['embeddings'][0]['embedding']},
'metadata': {
'title': meta.get('title', ''),
'author': meta.get('author', ''),
'slug': meta.get('slug', ''),
'published_at': meta.get('published_at', 0), # store as epoch seconds (int)
'language': meta.get('language', '')
}
})
if len(vectors) >= 100:
s3vectors.put_vectors(
vectorBucketName='my-vectors-bucket',
indexName='my-blog-index',
vectors=vectors
)
total += len(vectors)
vectors = []
if vectors:
s3vectors.put_vectors(
vectorBucketName='my-vectors-bucket',
indexName='my-blog-index',
vectors=vectors
)
total += len(vectors)
print(f'Registration complete: {total} items')
Key points:
- I stored
title,author,slug,published_at, andlanguageinmetadata. This allows retrieval of article metadata without querying a database during search put_vectorsoperates as upsert. Registering with the same key overwrites the vector and metadataput_vectorshas a maximum of 100 items per call, so I registered in batches of 100
S3 Vectors Metadata Constraints
S3 Vectors metadata has these constraints:
| Item | Limit |
|---|---|
| Total metadata size | 40 KB / vector (filterable + non-filterable) |
| Filterable metadata | 2 KB / vector |
| Metadata key count | 50 keys / vector |
| Non-filterable key count | 10 keys / index |
| Supported types | string, number, boolean, list |
Filterable metadata has a 2KB limit. If it's for display purposes only, you can register it as non-filterable metadata and use up to 40KB.
I set title as non-filterable metadata.
Metadata Filtering
S3 Vectors supports filtering by metadata using the filter parameter in query_vectors. You pass a dict directly to filter.
# Search only for articles by a specific author
result = s3vectors.query_vectors(
vectorBucketName=BUCKET_NAME,
indexName=INDEX_NAME,
queryVector={'float32': query_vector},
topK=10,
returnDistance=True,
returnMetadata=True,
filter={"author": {"$eq": "wakatsuki-ryuta"}}
)
For date range filtering, comparison operators like $gte / $lt only support Number types. By storing published_at as epoch seconds (integer), we enabled period specification.
from datetime import datetime, timezone
# Only articles from 2026 onwards
epoch_2026 = int(datetime(2026, 1, 1, tzinfo=timezone.utc).timestamp())
result = s3vectors.query_vectors(
vectorBucketName=BUCKET_NAME,
indexName=INDEX_NAME,
queryVector={'float32': query_vector},
topK=10,
returnDistance=True,
returnMetadata=True,
filter={"published_at": {"$gte": epoch_2026}}
)
# Only articles from 2025 (range specification)
epoch_2025 = int(datetime(2025, 1, 1, tzinfo=timezone.utc).timestamp())
result = s3vectors.query_vectors(
...,
filter={"published_at": {"$gte": epoch_2025, "$lt": epoch_2026}}
)
Main operators:
| Operator | Supported Types | Description |
|---|---|---|
$eq |
string, number, boolean | Exact match |
$ne |
string, number, boolean | Not equal |
$gt / $gte |
number | Greater than / Greater than or equal |
$lt / $lte |
number | Less than / Less than or equal |
$in / $nin |
array | Match any / Match none |
$exists |
boolean | Check field existence |
$and / $or |
array of filters | Logical AND / OR |
For details, refer to Metadata filtering.
Step 3: Search
Search Using AWS CLI
You can search using just the AWS CLI without Python.
Vectorize Search Terms
# Create request body
cat > /tmp/embed_request.json << 'EOF'
{
"schemaVersion": "nova-multimodal-embed-v1",
"taskType": "SINGLE_EMBEDDING",
"singleEmbeddingParams": {
"embeddingPurpose": "GENERIC_INDEX",
"embeddingDimension": 1024,
"text": {
"truncationMode": "END",
"value": "Lambda コールドスタート 対策"
}
}
}
EOF
# Vectorize with Bedrock (pass Japanese via file)
aws bedrock-runtime invoke-model \
--model-id amazon.nova-2-multimodal-embeddings-v1:0 \
--region us-east-1 \
--content-type application/json \
--body fileb:///tmp/embed_request.json \
/tmp/query_vector.json
Extract Query Vector and Search with S3 Vectors
# Extract vector array
python3 -c "
import json
d = json.load(open('/tmp/query_vector.json'))
print(json.dumps(d['embeddings'][0]['embedding']))
" > /tmp/query_vec_array.json
# Similar search with S3 Vectors
aws s3vectors query-vectors \
--vector-bucket-name my-vectors-bucket \
--index-name my-blog-index \
--query-vector "float32=$(cat /tmp/query_vec_array.json)" \
--top-k 5 \
--return-distance \
--return-metadata \
--region <YOUR_REGION>
Execution Result
{
"vectors": [
{
"distance": 0.1290004849433899,
"key": "article-001",
"metadata": {
"title": "Lambdaのコールドスタート問題と対策について整理する",
"published_at": 1732491853,
"language": "ja",
"author": "manabe-kenji",
"slug": "lambda-coldstart-measures"
}
},
{
"distance": 0.1369316577911377,
"key": "article-002",
"metadata": {
"language": "ja",
"slug": "lambda-cold-start-avoid-hack",
"published_at": 1559084450,
"title": "VPC Lambdaのコールドスタートにお悩みの方へ捧ぐコールドスタート予防のハック Lambdaを定期実行するならメモリの割り当ては1600Mがオススメ?!",
"author": "iwata-tomoya"
}
},
{
"distance": 0.14367318153381348,
"key": "article-003",
"metadata": {
"title": "LambdaのProvisioned Concurrencyを使って、コールドスタート対策をしてみた #reinvent",
"language": "ja",
"published_at": 1576128882,
"author": "sato-naoya",
"slug": "lambda-provisioned-concurrency-coldstart"
}
},
{
"distance": 0.15859538316726685,
"key": "article-004",
"metadata": {
"language": "ja",
"title": "[速報]コールドスタート対策のLambda定期実行とサヨナラ!! LambdaにProvisioned Concurrencyの設定が追加されました #reinvent",
"author": "iwata-tomoya",
"slug": "lambda-support-provisioned-concurrency",
"published_at": 1575418456
}
},
{
"distance": 0.16454929113388062,
"key": "article-005",
"metadata": {
"language": "ja",
"published_at": 1669858986,
"slug": "session-lamba-snapstart",
"author": "hamada-koji",
"title": "Lambdaのコールドスタートを解決するLambda SnapStartのセッションに参加してきた(SVS320) #reinvent"
}
}
],
"distanceMetric": "cosine"
}
By specifying --return-distance and --return-metadata, I received distance scores and metadata (title, author, slug, publication date, language).
With cosine distance, lower values indicate higher similarity. Articles related to "Lambda cold start" appeared at the top of the results.
Scaling to Full Volume
After confirming operation with 1,000 pilot items, I processed the full volume (about 57,000 items) using the same procedure. Since Batch Inference has a limit of 50,000 items per job, I split the input for submission.
| Item | 1,000 Items Pilot | 57,000 Items Full Volume |
|---|---|---|
| Batch Inference | ~15 minutes | ~30 minutes (split submission) |
| S3 Vectors Registration | ~10 seconds | ~10 minutes |
| Embedding Cost | ~$0.002 | ~$0.12 |
Supplement: Supporting Other Embedding Models
For Titan Text Embeddings v2
The modelInput format differs from Nova. Titan uses a simpler format with inputText / dimensions.
record = {
'recordId': article['id'],
'modelInput': {
'inputText': f"{article['title']}\n{article['summary']}",
'dimensions': 1024,
'normalize': True
}
}
The vector retrieval path also differs.
| Model | Vector Retrieval Path |
|---|---|
| Nova Embed v1 | modelOutput['embeddings'][0]['embedding'] |
| Titan Embeddings v2 | modelOutput['embedding'] |
For Cohere Embed v4
Cohere Embed v4 doesn't support Batch Inference, but the real-time API (invoke_model) supports batch processing of up to 96 items at once, making it practical for processing tens of thousands of items.
import boto3, json
bedrock = boto3.client('bedrock-runtime', region_name='us-west-2')
resp = bedrock.invoke_model(
modelId='us.cohere.embed-v4:0',
body=json.dumps({
"texts": ["Text 1", "Text 2", ...], # up to 96 items
"input_type": "search_document",
"embedding_types": ["float"],
"output_dimension": 1024
})
)
vectors = json.loads(resp['body'].read())['embeddings']['float']
Notes:
- Inference profile required: Directly specifying model ID
cohere.embed-v4:0results inValidationException. Useus.cohere.embed-v4:0(cross-region inference profile) - Change
input_typefor search queries: Use"search_document"for registration,"search_query"for search - Throttling mitigation: Implement exponential backoff retry for bulk processing
| Item | Nova Embed v1 | Titan Embeddings v2 | Cohere Embed v4 |
|---|---|---|---|
| Batch Inference | Supported | Supported | Not supported |
| Real-time Batch | 1 at a time | 1 at a time | Up to 96/call |
| Model ID | amazon.nova-2-multimodal-embeddings-v1:0 |
amazon.titan-embed-text-v2:0 |
us.cohere.embed-v4:0 |
| Vector retrieval path | embeddings[0]['embedding'] |
embedding |
embeddings['float'] |
| Registration/Search distinction | None | None | Specify with input_type |
Note Job Name Constraints
The jobName for create_model_invocation_job is restricted to the regular expression [a-zA-Z0-9](-*[a-zA-Z0-9+\-.])*. Underscores (_) cannot be used.
# Error
bedrock.create_model_invocation_job(jobName='titan-embed-summary_en', ...)
# OK
bedrock.create_model_invocation_job(jobName='titan-embed-summary-en', ...)
If your summary type names contain underscores, convert them to hyphens before using them in job names.
Conclusion
I built a semantic search by bulk vectorizing article summaries with Bedrock Batch Inference and registering them with metadata in S3 Vectors. Including title, author, slug, publication date, and language in the metadata allows retrieval of article information without querying a database during search, and enables filtering.
With Amazon S3 Vectors × Bedrock (amazon.nova-2-multimodal-embeddings-v1:0), you can easily build a search infrastructure that's serverless and fully managed.
However, the number of items retrievable per query is limited to 100, so different approaches might be needed for complex weighting or hybrid search on large-scale data. While OpenSearch is suitable for full-text search and complex ranking adjustments, try S3 Vectors if you want to experiment with low-cost, management-free semantic search first.