I tried creating an agent with voice interface using Strands Agent's BidiAgent and OpenAI Realtime API #AWSreInvent

AWS re:Invent 2025

2025.12.04

This page has been translated by machine translation. View original

Hello! I'm Takakuni (@takakuni_) from the Cloud Business Division Consulting Department.
Strands Agent now supports bidirectional streaming (BidiAgent), enabling voice conversations.
https://aws.amazon.com/jp/about-aws/whats-new/2025/12/typescript-strands-agents-preview/
In this article, I'd like to try having a conversation in Japanese using this update with OpenAI's Realtime API.
!Bidirectional streaming appears to be an Experimental feature at present.
There may be breaking changes without notice in the future, so please refrain from using it in production.
https://strandsagents.com/latest/documentation/docs/user-guide/concepts/experimental/bidirectional-streaming/quickstart/
 BidiAgentBidiAgent is an agent for bidirectional streaming. By using BidiAgent, you can perform continuous voice and text streaming while simultaneously executing tools.
Normally, you would define from strands import Agent as follows:
from strands import Agent
from strands_tools import calculator

agent = Agent(tools=[calculator])

# Single request-response cycle
result = agent("Calculate 25 * 48")
print(result.message)  # "The result is 1200"
For BidiAgent, we import it from from strands.experimental.bidi import BidiAgent, BidiAudioIO.
import asyncio
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.models import BidiNovaSonicModel

model = BidiNovaSonicModel()
agent = BidiAgent(model=model, tools=[calculator])
audio_io = BidiAudioIO()

async def main():
    # Persistent connection with continuous streaming
    await agent.run(
        inputs=[audio_io.input()],
        outputs=[audio_io.output()]
    )

asyncio.run(main())
Currently, BidiAgent can handle the following 3 types of LLMs:
Amazon Bedrock Nova Sonic
OpenAI Realtime API
Google Gemini Live
 Trying it outNow that we've briefly gone over BidiAgent, let's actually try having a conversation in Japanese.
 Installing the SDKI'll use uv as the package manager. Following the documentation, let's install the packages.
uv init
uv add "strands-agents-tools" "strands-agents[bidi,bidi-all,bidi-openai]"
https://strandsagents.com/latest/documentation/docs/user-guide/concepts/experimental/bidirectional-streaming/quickstart/
 Registering environment variablesRegister the OpenAI API Key via environment variables.
export OPENAI_API_KEY=your_api_key
 Implementing the codeLet's implement BidiAgent. With some effort, it could probably be implemented in even fewer lines, but it's nice that it can be done with so few lines already.
main.py
import asyncio

from strands.experimental.bidi import BidiAgent
from strands.experimental.bidi.io import BidiAudioIO, BidiTextIO
from strands.experimental.bidi.models.openai_realtime import BidiOpenAIRealtimeModel
from strands.experimental.bidi.tools import stop_conversation

from strands_tools import calculator

async def main() -> None:
    model = BidiOpenAIRealtimeModel(
        model_id="gpt-realtime",
        provider_config={
            "audio": {
                "voice": "coral",
            }
        },
    )
    # stop_conversation tool allows user to verbally stop agent execution.
    agent = BidiAgent(
        model=model,
        tools=[calculator, stop_conversation],
        system_prompt="You are a helpful assistant that can use the calculator tool to calculate numbers.",
    )

    audio_io = BidiAudioIO()
    text_io = BidiTextIO()
    await agent.run(inputs=[audio_io.input()], outputs=[audio_io.output(), text_io.output()])

if __name__ == "__main__":
    asyncio.run(main())
For provider_config, it's good to refer to the following documentation:
https://strandsagents.com/latest/documentation/docs/user-guide/concepts/experimental/bidirectional-streaming/models/openai_realtime/
I executed the code. The conversation is working well.
Currently, the PC output can be picked up as input, so I recommend using headphones for testing.
https://youtu.be/KwYzScpi3Hg
 SummaryThat's it for "Creating a Voice Interface Agent using Strands Agent's BidiAgent and OpenAI Realtime API."
It's great that you can create a voice interface agent with very few lines of code.
This was Takakuni (@takakuni_) from the Cloud Business Division Consulting Department!

I tried creating an agent with voice interface using Strands Agent's BidiAgent and OpenAI Realtime API #AWSreInvent

BidiAgent

Trying it out

Installing the SDK

Registering environment variables

Implementing the code

Summary

Related articles

AWS Topics

Trending Topics

Products & Services

Features and Series