When I built the WebSocket backend for Twilio ConversationRelay using API Gateway + Lambda, I encountered an issue where it wouldn't respond during the initial call

I built a WebSocket backend for Twilio ConversationRelay using API Gateway + Lambda. Compared to the ECS configuration, costs were reduced by approximately 85%, but I encountered an issue where the AI wouldn't respond to a user's first utterance during initial calls. After investigating CloudWatch Logs, I determined the problem was related to cold starts and resolved it by implementing warm-up via EventBridge.

越井琢巳 (Koshii Takumi)

2026.02.19

This page has been translated by machine translation. View original

Introduction

When building an AI voice response system using Twilio ConversationRelay, selecting the WebSocket backend infrastructure is one of the considerations.

Initially, we operated with an ECS/Fargate + ALB configuration, but this was costly for PoC purposes. After migrating to an API Gateway WebSocket API + Lambda configuration, our estimates showed we could reduce monthly costs by about 85% while simplifying operations.

However, immediately after this migration, we encountered an issue where the AI wouldn't respond to user speech during the first call. This article shares our investigation process and the solution to this problem.

Architecture

The post-migration architecture is as follows:

Lambda processes three routes - $connect, $default, and $disconnect - in a single function. The session state design is stateless, not maintained within Lambda.

Comparing with the ECS configuration:

Item	ECS/Fargate + ALB	API Gateway + Lambda
Monthly cost (1,000 calls)	About $130-150	About $16-20
Response time (prompt → answer)	About 3.5 seconds	About 2.5-4.3 seconds
Operational resources	VPC, ALB, ECS, etc. ~15 resources	API Gateway, Lambda, etc. ~5 resources
Cold start	None	About 600ms

The Lambda configuration excels in cost efficiency and operational simplicity but has the unique characteristic of cold starts.

Problem: AI Not Responding on First Call

After migrating to the Lambda configuration, we observed the following symptoms:

On the first call after a period of inactivity, the welcomeGreeting (greeting message on call reception) would play, but the AI wouldn't respond to the user's speech
Subsequent calls worked normally

This problem didn't occur with the ECS configuration, suggesting it was related to the Lambda configuration migration.

Investigation: CloudWatch Logs Analysis

Comparing Normal and Abnormal Logs

We compared CloudWatch Logs between calls that worked normally and those with issues.

Normal call (second and subsequent):

Lambda invoked { routeKey: "$connect", connectionId: "XXXXX=" }
Lambda invoked { routeKey: "$default", connectionId: "XXXXX=" }
Received message { type: "setup" }
Session setup { sessionId: "XXXXX-...", callSid: "CAXXXXX..." }
Lambda invoked { routeKey: "$default", connectionId: "XXXXX=" }
Received message { type: "prompt" }
Processing prompt { utteranceLength: 24 }
Intent detected { intent: "NORMAL" }
RAG search completed { hitCount: 3, topScore: 0.72 }
Answer sent { answerLength: 185, totalDurationMs: 3812 }

Problematic call (first):

INIT_START Runtime Version: nodejs:22.v45
Lambda invoked { routeKey: "$connect", connectionId: "YYYYY=" }
Lambda invoked { routeKey: "$default", connectionId: "YYYYY=" }
Received message { type: "setup" }
Session setup { sessionId: "YYYYY-...", callSid: "CAYYYYY..." }
(... about 34 seconds pass without receiving a prompt message ...)
Lambda invoked { routeKey: "$disconnect", connectionId: "YYYYY=" }

In normal calls, a prompt message containing the user's speech arrives from ConversationRelay after setup, but in problematic calls, it doesn't arrive.

Relationship with INIT_START

INIT_START is logged when a Lambda cold start occurs. After classifying all call logs by the presence of INIT_START and prompt messages, this relationship became apparent.

Call	INIT_START	prompt received	Result
Call A	Yes (Init Duration: 650ms)	No	Abnormal
Call B	No	Yes	Normal
Call C	Yes (Init Duration: 720ms)	No	Abnormal
Call D	No	Yes	Normal
Call E	Yes (Init Duration: 603ms)	No	Abnormal
Call F	No	Yes	Normal

Among approximately 25 calls we investigated, almost all calls with cold starts didn't receive a prompt.

When a cold start occurs, the response from the $connect handler is delayed by the Init Duration (about 650ms). While the $connect completes in about 10-20ms during warm starts, it takes about 650-720ms during cold starts. We concluded that this delay likely affects the initialization of STT (Speech-to-Text) on the ConversationRelay side, preventing the prompt message from being sent.

Solution: Lambda Warm-up Using EventBridge

To avoid cold starts, we set up an EventBridge schedule rule to call Lambda every 5 minutes, keeping the execution environment warm.

Detecting Warm-up in Lambda Handler

We detect calls from EventBridge and return a response immediately:

export async function handler(
    event: APIGatewayProxyWebsocketEventV2 | Record<string, unknown>
): Promise<APIGatewayProxyResultV2> {
    // Detect warm-up calls from EventBridge
    if ('source' in event && event.source === 'aws.events') {
        console.log('Warmup invocation');
        return { statusCode: 200, body: 'Warm' };
    }

    // Normal WebSocket message processing below
    const wsEvent = event as APIGatewayProxyWebsocketEventV2;
    // ...
}

Terraform EventBridge Resource Definition

# Call Lambda every 5 minutes to keep it warm
resource "aws_cloudwatch_event_rule" "lambda_warmup" {
  name                = "${var.project_name}-warmup-${var.environment}"
  description         = "Keep Lambda warm to avoid cold start issues with ConversationRelay STT"
  schedule_expression = "rate(5 minutes)"
}

resource "aws_cloudwatch_event_target" "lambda_warmup" {
  rule = aws_cloudwatch_event_rule.lambda_warmup.name
  arn  = aws_lambda_function.ws_handler.arn
}

resource "aws_lambda_permission" "warmup" {
  statement_id  = "AllowCloudWatchWarmup"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.ws_handler.function_name
  principal     = "events.amazonaws.com"
  source_arn    = aws_cloudwatch_event_rule.lambda_warmup.arn
}

Verification

After applying the solution, we confirmed the following:

Logs from EventBridge warm-up invocation:

INIT_START Runtime Version: nodejs:22.v45  Init Duration: 654.12 ms
Warmup invocation
REPORT RequestId: XXXXX  Duration: 2.85 ms  Billed Duration: 657 ms  Init Duration: 654.12 ms

Logs from a subsequent call (no Init Duration = warm start):

Lambda invoked { routeKey: "$connect", connectionId: "XXXXX=" }
Lambda invoked { routeKey: "$default", connectionId: "XXXXX=" }
Received message { type: "setup" }
Session setup { sessionId: "XXXXX-..." }
Lambda invoked { routeKey: "$default", connectionId: "XXXXX=" }
Received message { type: "prompt" }
Processing prompt { utteranceLength: 24 }
Answer sent { answerLength: 185, totalDurationMs: 3812 }

The warm-up avoided cold starts, allowing prompt messages to be received normally.

Conclusion

We investigated and resolved an issue where the AI wouldn't respond to user speech during the first call when using API Gateway + Lambda as a WebSocket backend for Twilio ConversationRelay. CloudWatch Logs indicated that Lambda cold starts were likely the cause. Implementing warm-ups every 5 minutes using EventBridge avoided cold starts and resolved the issue.

While the API Gateway + Lambda configuration offers significant cost and operational advantages compared to ECS, when combining it with services like ConversationRelay that are sensitive to WebSocket connection establishment speed, cold starts need to be considered.