When I built the WebSocket backend for Twilio ConversationRelay using API Gateway + Lambda, I encountered an issue where it wouldn't respond during the initial call
This page has been translated by machine translation. View original
Introduction
When building an AI voice response system using Twilio ConversationRelay, selecting the WebSocket backend infrastructure is one of the considerations.
Initially, we operated with an ECS/Fargate + ALB configuration, but this was costly for PoC purposes. After migrating to an API Gateway WebSocket API + Lambda configuration, our estimates showed we could reduce monthly costs by about 85% while simplifying operations.
However, immediately after this migration, we encountered an issue where the AI wouldn't respond to user speech during the first call. This article shares our investigation process and the solution to this problem.
Architecture
The post-migration architecture is as follows:
Lambda processes three routes - $connect, $default, and $disconnect - in a single function. The session state design is stateless, not maintained within Lambda.
Comparing with the ECS configuration:
| Item | ECS/Fargate + ALB | API Gateway + Lambda |
|---|---|---|
| Monthly cost (1,000 calls) | About $130-150 | About $16-20 |
| Response time (prompt → answer) | About 3.5 seconds | About 2.5-4.3 seconds |
| Operational resources | VPC, ALB, ECS, etc. ~15 resources | API Gateway, Lambda, etc. ~5 resources |
| Cold start | None | About 600ms |
The Lambda configuration excels in cost efficiency and operational simplicity but has the unique characteristic of cold starts.
Problem: AI Not Responding on First Call
After migrating to the Lambda configuration, we observed the following symptoms:
- On the first call after a period of inactivity, the
welcomeGreeting(greeting message on call reception) would play, but the AI wouldn't respond to the user's speech - Subsequent calls worked normally
This problem didn't occur with the ECS configuration, suggesting it was related to the Lambda configuration migration.
Investigation: CloudWatch Logs Analysis
Comparing Normal and Abnormal Logs
We compared CloudWatch Logs between calls that worked normally and those with issues.
Normal call (second and subsequent):
Lambda invoked { routeKey: "$connect", connectionId: "XXXXX=" }
Lambda invoked { routeKey: "$default", connectionId: "XXXXX=" }
Received message { type: "setup" }
Session setup { sessionId: "XXXXX-...", callSid: "CAXXXXX..." }
Lambda invoked { routeKey: "$default", connectionId: "XXXXX=" }
Received message { type: "prompt" }
Processing prompt { utteranceLength: 24 }
Intent detected { intent: "NORMAL" }
RAG search completed { hitCount: 3, topScore: 0.72 }
Answer sent { answerLength: 185, totalDurationMs: 3812 }
Problematic call (first):
INIT_START Runtime Version: nodejs:22.v45
Lambda invoked { routeKey: "$connect", connectionId: "YYYYY=" }
Lambda invoked { routeKey: "$default", connectionId: "YYYYY=" }
Received message { type: "setup" }
Session setup { sessionId: "YYYYY-...", callSid: "CAYYYYY..." }
(... about 34 seconds pass without receiving a prompt message ...)
Lambda invoked { routeKey: "$disconnect", connectionId: "YYYYY=" }
In normal calls, a prompt message containing the user's speech arrives from ConversationRelay after setup, but in problematic calls, it doesn't arrive.
Relationship with INIT_START
INIT_START is logged when a Lambda cold start occurs. After classifying all call logs by the presence of INIT_START and prompt messages, this relationship became apparent.
| Call | INIT_START | prompt received | Result |
|---|---|---|---|
| Call A | Yes (Init Duration: 650ms) | No | Abnormal |
| Call B | No | Yes | Normal |
| Call C | Yes (Init Duration: 720ms) | No | Abnormal |
| Call D | No | Yes | Normal |
| Call E | Yes (Init Duration: 603ms) | No | Abnormal |
| Call F | No | Yes | Normal |
Among approximately 25 calls we investigated, almost all calls with cold starts didn't receive a prompt.
When a cold start occurs, the response from the $connect handler is delayed by the Init Duration (about 650ms). While the $connect completes in about 10-20ms during warm starts, it takes about 650-720ms during cold starts. We concluded that this delay likely affects the initialization of STT (Speech-to-Text) on the ConversationRelay side, preventing the prompt message from being sent.
Solution: Lambda Warm-up Using EventBridge
To avoid cold starts, we set up an EventBridge schedule rule to call Lambda every 5 minutes, keeping the execution environment warm.
Detecting Warm-up in Lambda Handler
We detect calls from EventBridge and return a response immediately:
export async function handler(
event: APIGatewayProxyWebsocketEventV2 | Record<string, unknown>
): Promise<APIGatewayProxyResultV2> {
// Detect warm-up calls from EventBridge
if ('source' in event && event.source === 'aws.events') {
console.log('Warmup invocation');
return { statusCode: 200, body: 'Warm' };
}
// Normal WebSocket message processing below
const wsEvent = event as APIGatewayProxyWebsocketEventV2;
// ...
}
Terraform EventBridge Resource Definition
# Call Lambda every 5 minutes to keep it warm
resource "aws_cloudwatch_event_rule" "lambda_warmup" {
name = "${var.project_name}-warmup-${var.environment}"
description = "Keep Lambda warm to avoid cold start issues with ConversationRelay STT"
schedule_expression = "rate(5 minutes)"
}
resource "aws_cloudwatch_event_target" "lambda_warmup" {
rule = aws_cloudwatch_event_rule.lambda_warmup.name
arn = aws_lambda_function.ws_handler.arn
}
resource "aws_lambda_permission" "warmup" {
statement_id = "AllowCloudWatchWarmup"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.ws_handler.function_name
principal = "events.amazonaws.com"
source_arn = aws_cloudwatch_event_rule.lambda_warmup.arn
}
Verification
After applying the solution, we confirmed the following:
- Logs from EventBridge warm-up invocation:
INIT_START Runtime Version: nodejs:22.v45 Init Duration: 654.12 ms
Warmup invocation
REPORT RequestId: XXXXX Duration: 2.85 ms Billed Duration: 657 ms Init Duration: 654.12 ms
- Logs from a subsequent call (no Init Duration = warm start):
Lambda invoked { routeKey: "$connect", connectionId: "XXXXX=" }
Lambda invoked { routeKey: "$default", connectionId: "XXXXX=" }
Received message { type: "setup" }
Session setup { sessionId: "XXXXX-..." }
Lambda invoked { routeKey: "$default", connectionId: "XXXXX=" }
Received message { type: "prompt" }
Processing prompt { utteranceLength: 24 }
Answer sent { answerLength: 185, totalDurationMs: 3812 }
The warm-up avoided cold starts, allowing prompt messages to be received normally.
Conclusion
We investigated and resolved an issue where the AI wouldn't respond to user speech during the first call when using API Gateway + Lambda as a WebSocket backend for Twilio ConversationRelay. CloudWatch Logs indicated that Lambda cold starts were likely the cause. Implementing warm-ups every 5 minutes using EventBridge avoided cold starts and resolved the issue.
While the API Gateway + Lambda configuration offers significant cost and operational advantages compared to ECS, when combining it with services like ConversationRelay that are sensitive to WebSocket connection establishment speed, cold starts need to be considered.