produced by Classmethod

I tried automatic transcription of call recordings with Twilio Conversational Intelligence and Functions to obtain summaries and sentiment analysis

I built a flow that combines Twilio Conversational Intelligence and Twilio Functions to automatically transcribe phone recordings and obtain summaries, sentiment analysis, and custom classification results. Through the Intelligence Service configuration, implementation of three Functions, and methods to retrieve OperatorResults from a local Node.js script, I'll review the basic usage of Conversational Intelligence.

越井琢巳 (Koshii Takumi)

2025.12.10

This page has been translated by machine translation. View original

 IntroductionIn this article, I will introduce the procedure to automatically transcribe voice call recordings, and obtain summaries, sentiment analysis, and custom classification results by combining Twilio Conversational Intelligence (hereinafter, Conversational Intelligence) and Twilio Functions.
 What is TwilioTwilio is a cloud communication platform that provides features such as voice calls, messaging, email, and contact centers as APIs. Developers can flexibly add features like calls, chat, and authentication to their own services by integrating Twilio's APIs.
 Target AudiencePeople who have used Twilio's Voice API or Twilio Studio and want to utilize call content as text
People who want to structure and analyze contact center or campaign hotline calls in the form of summaries or sentiment analysis
People who want to understand the overall picture of Conversational Intelligence and the actual API calls and Function configuration
 ReferencesConversational Intelligence
Conversational Intelligence onboarding guide
 Overview of Conversational IntelligenceConversational Intelligence is a service that transcribes conversations from voice calls and messages, and makes them available as structured data through AI-based language analysis. A key feature is the ability to apply not just transcription, but also sentiment analysis, summarization, topic extraction, entity extraction, and other processing all at once. The analysis results can be retrieved from APIs in the form of Transcripts or Language Operators, and can be linked to existing business systems or dashboards.
!As Conversational Intelligence does not support Japanese as of December 2025, this article will validate using English. Please refer to the official documentation for the latest information as supported languages may be added over time.
 Supported Channels and Data FlowConversational Intelligence ingests conversation data from the following channels:
Voice (telephone)
Twilio Recordings (recording files from Twilio Programmable Voice)
External Recordings (audio files recorded by third parties)
Calls (real-time transcription of ongoing calls)
ConversationRelay (call logs with AI agents)

Messaging (SMS)
Twilio Conversations (SMS, WhatsApp, WebChat, etc.)

Developers create an Intelligence Service in their application and link the Service to target calls or messages. This allows voice and text flowing from Twilio Voice or Conversations API to be automatically stored as Transcripts, and analysis results from Language Operators (described later) can be obtained together.
 Intelligence Service and Language OperatorIn Conversational Intelligence, Intelligence Service is provided as the central configuration unit. An Intelligence Service has the following information:
Target account
Language to use (LanguageCode)
Automatic transcription settings (AutoTranscribe)
Automatic PII masking settings (AutoRedaction, MediaRedaction)
Data logging settings
Webhook URL and HTTP method
Public key for encryption (EncryptionCredentialSid)
List of Language Operators to link
Language Operator is the unit of analysis processing that is executed on a Transcript. The following Pre-built Operators are typically provided:
Conversation Summary
Sentiment Analysis
Entity Extraction, etc.
Furthermore, using Generative Custom Operators allows developers to define arbitrary prompts and JSON schemas to perform flexible LLM-based analysis and classification.
 Actually Using Twilio Conversational IntelligenceFrom here, I will introduce a configuration that creates an Intelligence Service and performs recording and analysis from Twilio Functions. In this validation, we will implement the following flow:
When a call is made to a Twilio phone number, the Twilio Function plays guidance in Japanese and records the speech content
From the callback at recording completion, the Conversational Intelligence Transcript API is called to send the recording to the Intelligence Service
After transcript generation and analysis is complete, the Intelligence Service notifies the Twilio Function via Webhook
Obtain the TranscriptSid from the Function logs and retrieve the Transcript itself and Language Operator results from a local Node.js script
 Creating an Intelligence ServiceFirst, we create an Intelligence Service, which forms the foundation for Conversational Intelligence. From Conversational Intelligence > Intelligence Service, click Create a Service to create a Service.
Unique name: any identifier such as test-ci
Language: English
Auto transcribe: Off as we will manually call the Transcript API when recording is complete
PII Redaction: Off for testing purposes
After creating the Service, configure the following settings in the Webhook tab.
Webhook URL: The URL of the /ci-webhook Function described later
Webhook HTTP method: POST
In the Language Operators tab, add Conversation Summary and Sentiment Analysis.
Once added, it will display Added to Service.
Next, click Create custom operator to create a custom operator.
Select the Generative type and enter the following content in the prompt:
!The Generative type for Custom operator is in Beta as of December 2025.
この会話は顧客からの製品に関する問合せです。 会話の内容から、フィードバックの種類を category として pricing / other のいずれかで分類してください。 ユーザーの感情を emotion として positive / neutral / negative のいずれかで分類してください。
Set the Output format to JSON and enter the following content:
{
  "type": "object",
  "properties": {
    "category": {
      "type": "string"
    },
    "emotion": {
      "type": "string"
    }
  }
}
Enter the following in Training examples:


Example conversation
Expected output results


How much your product?
{"category":"pricing","emotion":"neutral"}

We cannot find your products.
{"category":"other","emotion":"negative"}

By defining it this way, a JSON with two fields, category and emotion, will be returned for each Transcript.
After creating the Service, make note of the Service SID (for example, GAxxxxxxxx...). It will be used as serviceSid in the Twilio Function described later.
 Recording Calls with Twilio FunctionSet the environment variables as follows:


Variable Name
Value


RECORDING_STATUS_URL
Public URL of the /ci-create-transcript Function described later

INTELLIGENCE_SERVICE_SID
SID of Intelligence Service

The first Function /incoming-call plays guidance for incoming calls and records the speech. To pass the RecordingSid to the next Function after recording is complete, it specifies recordingStatusCallback.
/incoming-callexports.handler = function (context, event, callback) {
  const twiml = new Twilio.twiml.VoiceResponse();

  // If RecordingSid is provided, it means this is called again after recording is complete
  if (event.RecordingSid) {
    // On the second call, just thank the caller and hang up
    twiml.say(
      {
        language: "ja-JP",
        voice: "woman",
      },
      "ありがとうございました。"
    );
    twiml.hangup();
  } else {
    // On the first call, play guidance and start recording
    twiml.say(
      {
        language: "ja-JP",
      },
      "ピーという音のあとに話してください。"
    );

    twiml.record({
      // Callback to /ci-create-transcript (described later) when recording is complete
      recordingStatusCallback: context.RECORDING_STATUS_URL,
      recordingStatusCallbackEvent: ["completed"],
      recordingChannels: "dual",
      timeout: 5,
      maxLength: 120,
      playBeep: true,
    });
  }

  return callback(null, twiml);
};
By specifying recordingChannels: "dual", it creates a recording with separate channels for each speaker. Conversational Intelligence can generate transcripts with speaker identification from dual-channel recordings.
maxLength and timeout are set short for testing purposes. Adjust according to your use case in a production environment.
After creating the Function, register it as A call comes in in the Voice Configuration of your Twilio purchased number.
 Function to Create Transcript from Completed RecordingThe next Function /ci-create-transcript calls the Transcript API from the recording completion callback.
/ci-create-transcriptexports.handler = async function (context, event, callback) {
  const client = context.getTwilioClient();

  // Values passed from Voice recordingStatusCallback
  const recordingSid = event.RecordingSid; // Example: REXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  const callSid = event.CallSid;

  console.log("RecordingSid:", recordingSid, "CallSid:", callSid);

  try {
    const transcript = await client.intelligence.v2.transcripts.create({
      serviceSid: context.INTELLIGENCE_SERVICE_SID,
      channel: {
        media_properties: {
          // Pass Twilio Recording to Conversational Intelligence
          source_sid: recordingSid,
        },
      },
      // Add arbitrary key for easier search later
      customerKey: callSid,
    });

    console.log("Queued transcript:", transcript.sid, "status:", transcript.status);

    // An empty response is sufficient for Twilio Function
    return callback(null, {});
  } catch (err) {
    console.error("Failed to create transcript", err);
    return callback(err);
  }
};
By specifying RecordingSid in channel.media_properties.source_sid, it passes the Twilio recording file to Conversational Intelligence
By putting CallSid in customerKey, it makes it easier to search for the Transcript later or link it to external systems
This call is asynchronous, and the analysis is not yet complete when the return value's status is queued. When the Transcript generation and Language Operator execution are complete, an event is sent to the Webhook set in the Intelligence Service.
 Function to Receive TranscriptSid from WebhookThe final Function /ci-webhook is registered as the Webhook URL for the Intelligence Service and is called when the Transcript analysis is complete.
/ci-webhookexports.handler = async function (context, event, callback) {
  const eventType = event.event_type;

  // When Transcript is complete, voice_intelligence_transcript_available is sent
  if (eventType === "voice_intelligence_transcript_available") {
    console.log("[CI] Transcript available.");
    console.log("  TranscriptSid:", event.transcript_sid);
    console.log("  CustomerKey  :", event.customer_key);
    return callback(null, { ok: true });
  }

  console.log("[CI] Unhandled event_type:", eventType);
  return callback(null, { ok: true });
};
A guard is in place to process only when event_type is voice_intelligence_transcript_available
This Function only outputs TranscriptSid and CustomerKey to logs, while the actual retrieval of Transcript / OperatorResults is done from local Node.js
 Operation Verification FlowTurn ON Live logs for Functions

Make an actual phone call, follow the guidance, speak, and end the call

Example speech: We cannot log in to your account system.
Confirm that a Transcript has been generated in the Conversational Intelligence screen of the Twilio console

Confirm that TranscriptSid is output in the Function's Live logs
Dec 10, 2025, 07:15:44 PM
Fetching content for /ci-webhook
Dec 10, 2025, 07:16:03 PM
Execution started...
Dec 10, 2025, 07:16:09 PM
[CI] Transcript available.
Dec 10, 2025, 07:16:09 PM
TranscriptSid: GT**** ← Use this
Dec 10, 2025, 07:16:09 PM
CustomerKey : CA****
Dec 10, 2025, 07:16:09 PM
Execution ended in 4.8ms using 107MB
Run the following script from your local environment to confirm that Transcript and OperatorResults metadata can be retrieved.
ci-dump-operator-results.jsrequire("dotenv").config();

const twilio = require("twilio");

const accountSid = process.env.TWILIO_ACCOUNT_SID;
const authToken = process.env.TWILIO_AUTH_TOKEN;
const transcriptSid = process.env.TRANSCRIPT_SID;  // Set the TranscriptSid from earlier here

if (!accountSid || !authToken) {
  console.error("TWILIO_ACCOUNT_SID / TWILIO_AUTH_TOKEN are not set.");
  process.exit(1);
}

const client = twilio(accountSid, authToken);

async function main() {
  console.log("Target TranscriptSid:", transcriptSid);

  // First check the Transcript summary
  try {
    const transcript = await client.intelligence.v2
      .transcripts(transcriptSid)
      .fetch();

    console.log("=== Transcript info ===");
    console.log("sid:", transcript.sid);
    console.log("status:", transcript.status);
    console.log("customerKey:", transcript.customerKey);
    console.log("serviceSid:", transcript.serviceSid);
    console.log("url:", transcript.url);
  } catch (err) {
    console.error("Failed to fetch transcript:", err);
    process.exit(1);
  }

  // Then get all OperatorResults and output line by line
  try {
    const operatorResults = await client.intelligence.v2
      .transcripts(transcriptSid)
      .operatorResults.list({ limit: 20 });

    console.log("OperatorResults count:", operatorResults.length);

    operatorResults.forEach((r, i) => {
      console.log(`\n--- OperatorResult[${i}] ---`);
      console.log("operatorType:", r.operatorType);
      console.log("name:", r.name);
      console.log("operatorSid:", r.operatorSid);

      console.log("extractMatch:", r.extractMatch);
      console.log("matchProbability:", r.matchProbability);
      console.log("normalizedResult:", r.normalizedResult);
      console.log("utteranceMatch:", r.utteranceMatch);

      console.log("predictedLabel:", r.predictedLabel);
      console.log("predictedProbability:", r.predictedProbability);

      console.log(
        "labelProbabilities:",
        r.labelProbabilities
          ? JSON.stringify(r.labelProbabilities, null, 2)
          : r.labelProbabilities
      );

      console.log(
        "extractResults:",
        r.extractResults
          ? JSON.stringify(r.extractResults, null, 2)
          : r.extractResults
      );

      console.log(
        "utteranceResults:",
        r.utteranceResults
          ? JSON.stringify(r.utteranceResults, null, 2)
          : r.utteranceResults
      );

      console.log(
        "textGenerationResults:",
        r.textGenerationResults
          ? JSON.stringify(r.textGenerationResults, null, 2)
          : r.textGenerationResults
      );

      console.log(
        "jsonResults:",
        r.jsonResults ? JSON.stringify(r.jsonResults, null, 2) : r.jsonResults
      );

      console.log("transcriptSid:", r.transcriptSid);
      console.log("url:", r.url);
    });
  } catch (err) {
    console.error("Failed to fetch operator results:", err);
    process.exit(1);
  }
}

main();
Example of execution resultsTarget TranscriptSid: GT****
=== Transcript info ===
sid: GT****
status: completed
customerKey: CA****
serviceSid: GA****
url: https://intelligence.twilio.com/v2/Transcripts/GT****
OperatorResults count: 3

--- OperatorResult[0] ---
operatorType: json
name: test-operator
operatorSid: LY****
extractMatch: null
matchProbability: null
normalizedResult: null
utteranceMatch: null
predictedLabel: null
predictedProbability: null
labelProbabilities: {}
extractResults: {}
utteranceResults: []
textGenerationResults: null
jsonResults: {
  "category": "other",
  "emotion": "negative"
}
transcriptSid: GT****
url: https://intelligence.twilio.com/v2/Transcripts/GT****/OperatorResults/LY****

--- OperatorResult[1] ---
operatorType: text-generation
name: Conversation Summary
operatorSid: LY****
extractMatch: null
matchProbability: null
normalizedResult: null
utteranceMatch: null
predictedLabel: null
predictedProbability: null
labelProbabilities: {}
extractResults: {}
utteranceResults: []
textGenerationResults: {
  "format": "text",
  "result": "The customer is experiencing issues logging into their account. The call center agent informs the customer that they are unable to access the account system. The conversation revolves around troubleshooting the login problem."
}
jsonResults: null
transcriptSid: GT****
url: https://intelligence.twilio.com/v2/Transcripts/GT****/OperatorResults/LY****

--- OperatorResult[2] ---
operatorType: conversation-classify
name: Sentiment Analysis
operatorSid: LY****
extractMatch: null
matchProbability: null
normalizedResult: null
utteranceMatch: null
predictedLabel: neutral
predictedProbability: 1.0
labelProbabilities: {
  "neutral": 1
}
extractResults: {}
utteranceResults: []
textGenerationResults: null
jsonResults: null
transcriptSid: GT****
url: https://intelligence.twilio.com/v2/Transcripts/GT****/OperatorResults/LY****
 SummaryIn this article, I introduced a simple configuration that combines Twilio Conversational Intelligence and Twilio Functions to automatically analyze voice call recordings. By linking Language Operators to an Intelligence Service, you can obtain insights such as summaries, sentiment analysis, and custom classifications all at once just by providing a recording file.
While this article focused primarily on retrieving OperatorResults for Transcripts from local Node.js, in the future, you could further leverage the value of Conversational Intelligence by trying detailed analysis using sentence-level data from Transcripts, or workflow automation combining other Twilio products (ConversationRelay, Studio, SendGrid, etc.).

Share this article

Related articles

Getting Started with Twilio Studio: A Beginner’s Guide!

PoojaSunil Jadhav

2024.04.25

Automating EBS Snapshot Deletion with AWS Lambda by EC2 Termination

2024.09.02

Automating AWS Lambda Invocations to run every 2 Minutes Using with Amazon EventBridge

2024.06.05

Hands-on blog for Slack and Jira Integration

Charu Srivastava

2024.02.19