I tried creating an agent with voice interface using Strands Agent's BidiAgent and OpenAI Realtime API #AWSreInvent

I tried creating an agent with voice interface using Strands Agent's BidiAgent and OpenAI Realtime API #AWSreInvent

2025.12.04

This page has been translated by machine translation. View original

Hello! I'm Takakuni (@takakuni_) from the Cloud Business Division Consulting Department.

Strands Agent now supports bidirectional streaming (BidiAgent), enabling voice conversations.

https://aws.amazon.com/jp/about-aws/whats-new/2025/12/typescript-strands-agents-preview/

In this article, I'd like to try having a conversation in Japanese using this update with OpenAI's Realtime API.

BidiAgent

BidiAgent is an agent for bidirectional streaming. By using BidiAgent, you can perform continuous voice and text streaming while simultaneously executing tools.

Normally, you would define from strands import Agent as follows:

from strands import Agent
from strands_tools import calculator

agent = Agent(tools=[calculator])

# Single request-response cycle
result = agent("Calculate 25 * 48")
print(result.message)  # "The result is 1200"

For BidiAgent, we import it from from strands.experimental.bidi import BidiAgent, BidiAudioIO.

import asyncio
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.models import BidiNovaSonicModel

model = BidiNovaSonicModel()
agent = BidiAgent(model=model, tools=[calculator])
audio_io = BidiAudioIO()

async def main():
    # Persistent connection with continuous streaming
    await agent.run(
        inputs=[audio_io.input()],
        outputs=[audio_io.output()]
    )

asyncio.run(main())

Currently, BidiAgent can handle the following 3 types of LLMs:

  • Amazon Bedrock Nova Sonic
  • OpenAI Realtime API
  • Google Gemini Live

Trying it out

Now that we've briefly gone over BidiAgent, let's actually try having a conversation in Japanese.

Installing the SDK

I'll use uv as the package manager. Following the documentation, let's install the packages.

uv init
uv add "strands-agents-tools" "strands-agents[bidi,bidi-all,bidi-openai]"

https://strandsagents.com/latest/documentation/docs/user-guide/concepts/experimental/bidirectional-streaming/quickstart/

Registering environment variables

Register the OpenAI API Key via environment variables.

export OPENAI_API_KEY=your_api_key

Implementing the code

Let's implement BidiAgent. With some effort, it could probably be implemented in even fewer lines, but it's nice that it can be done with so few lines already.

main.py
import asyncio

from strands.experimental.bidi import BidiAgent
from strands.experimental.bidi.io import BidiAudioIO, BidiTextIO
from strands.experimental.bidi.models.openai_realtime import BidiOpenAIRealtimeModel
from strands.experimental.bidi.tools import stop_conversation

from strands_tools import calculator

async def main() -> None:
    model = BidiOpenAIRealtimeModel(
        model_id="gpt-realtime",
        provider_config={
            "audio": {
                "voice": "coral",
            }
        },
    )
    # stop_conversation tool allows user to verbally stop agent execution.
    agent = BidiAgent(
        model=model,
        tools=[calculator, stop_conversation],
        system_prompt="You are a helpful assistant that can use the calculator tool to calculate numbers.",
    )

    audio_io = BidiAudioIO()
    text_io = BidiTextIO()
    await agent.run(inputs=[audio_io.input()], outputs=[audio_io.output(), text_io.output()])

if __name__ == "__main__":
    asyncio.run(main())

For provider_config, it's good to refer to the following documentation:

https://strandsagents.com/latest/documentation/docs/user-guide/concepts/experimental/bidirectional-streaming/models/openai_realtime/

I executed the code. The conversation is working well.

Currently, the PC output can be picked up as input, so I recommend using headphones for testing.

https://youtu.be/KwYzScpi3Hg

Summary

That's it for "Creating a Voice Interface Agent using Strands Agent's BidiAgent and OpenAI Realtime API."

It's great that you can create a voice interface agent with very few lines of code.

This was Takakuni (@takakuni_) from the Cloud Business Division Consulting Department!

Share this article

FacebookHatena blogX

Related articles