
I tried creating an agent with voice interface using Strands Agent's BidiAgent and OpenAI Realtime API #AWSreInvent
This page has been translated by machine translation. View original
Hello! I'm Takakuni (@takakuni_) from the Cloud Business Division Consulting Department.
Strands Agent now supports bidirectional streaming (BidiAgent), enabling voice conversations.
In this article, I'd like to try having a conversation in Japanese using this update with OpenAI's Realtime API.
BidiAgent
BidiAgent is an agent for bidirectional streaming. By using BidiAgent, you can perform continuous voice and text streaming while simultaneously executing tools.
Normally, you would define from strands import Agent as follows:
from strands import Agent
from strands_tools import calculator
agent = Agent(tools=[calculator])
# Single request-response cycle
result = agent("Calculate 25 * 48")
print(result.message) # "The result is 1200"
For BidiAgent, we import it from from strands.experimental.bidi import BidiAgent, BidiAudioIO.
import asyncio
from strands.experimental.bidi import BidiAgent, BidiAudioIO
from strands.experimental.bidi.models import BidiNovaSonicModel
model = BidiNovaSonicModel()
agent = BidiAgent(model=model, tools=[calculator])
audio_io = BidiAudioIO()
async def main():
# Persistent connection with continuous streaming
await agent.run(
inputs=[audio_io.input()],
outputs=[audio_io.output()]
)
asyncio.run(main())
Currently, BidiAgent can handle the following 3 types of LLMs:
- Amazon Bedrock Nova Sonic
- OpenAI Realtime API
- Google Gemini Live
Trying it out
Now that we've briefly gone over BidiAgent, let's actually try having a conversation in Japanese.
Installing the SDK
I'll use uv as the package manager. Following the documentation, let's install the packages.
uv init
uv add "strands-agents-tools" "strands-agents[bidi,bidi-all,bidi-openai]"
Registering environment variables
Register the OpenAI API Key via environment variables.
export OPENAI_API_KEY=your_api_key
Implementing the code
Let's implement BidiAgent. With some effort, it could probably be implemented in even fewer lines, but it's nice that it can be done with so few lines already.
import asyncio
from strands.experimental.bidi import BidiAgent
from strands.experimental.bidi.io import BidiAudioIO, BidiTextIO
from strands.experimental.bidi.models.openai_realtime import BidiOpenAIRealtimeModel
from strands.experimental.bidi.tools import stop_conversation
from strands_tools import calculator
async def main() -> None:
model = BidiOpenAIRealtimeModel(
model_id="gpt-realtime",
provider_config={
"audio": {
"voice": "coral",
}
},
)
# stop_conversation tool allows user to verbally stop agent execution.
agent = BidiAgent(
model=model,
tools=[calculator, stop_conversation],
system_prompt="You are a helpful assistant that can use the calculator tool to calculate numbers.",
)
audio_io = BidiAudioIO()
text_io = BidiTextIO()
await agent.run(inputs=[audio_io.input()], outputs=[audio_io.output(), text_io.output()])
if __name__ == "__main__":
asyncio.run(main())
For provider_config, it's good to refer to the following documentation:
I executed the code. The conversation is working well.
Currently, the PC output can be picked up as input, so I recommend using headphones for testing.
Summary
That's it for "Creating a Voice Interface Agent using Strands Agent's BidiAgent and OpenAI Realtime API."
It's great that you can create a voice interface agent with very few lines of code.
This was Takakuni (@takakuni_) from the Cloud Business Division Consulting Department!

