Bi-directional streaming for real-time agent interactions now available in Amazon Bedrock AgentCore Runtime

Building natural voice conversations with AI agents requires complex infrastructure and lots of code from engineering teams. Text-based agent interactions follow a turn-based pattern: a user sends a complete request, waits for the agent to process it, and receives a full response before continuing. Bi-directional streaming removes this constraint by establishing a persistent connection that carries data in both directions simultaneously.

Amazon Bedrock AgentCore Runtime supports bi-directional streaming for real-time, two-way communication between users and AI agents. With this capability, agents can simultaneously listen to user input while generating responses, creating a more natural conversational flow. This is particularly well-suited for multimodal interactions, such as voice and vision agent conversations. The agent can begin responding while still receiving user input, handle mid-conversation interruptions, and adjust its responses based on real-time feedback.

A bi-directional voice chat agent can conduct spoken conversations with the fluidity of human dialogue so that users can interrupt, clarify, or change topics naturally. These agents process streaming audio input and output simultaneously while maintaining conversational state. Building this infrastructure requires managing persistent low-latency connections, handling concurrent audio streams, preserving context across exchanges, and scaling multiple conversations. Implementing these capabilities from scratch demands months of engineering effort and specialized real-time systems expertise. Amazon Bedrock AgentCore Runtime addresses these challenges by providing a secure, serverless, and purpose-built hosting environment for deploying and running AI agents, without requiring developers to build and maintain complex streaming infrastructure themselves.

In this post, you will learn about bi-directional streaming on AgentCore Runtime and the prerequisites to create a WebSocket implementation. You will also learn how to use Strands Agents to implement a bi-directional streaming solution for voice agents.

AgentCore Runtime bi-directional streaming

Bi-directional streaming uses the WebSocket protocol. WebSocket provides full-duplex communication over a single TCP connection, establishing a persistent channel where data flows continuously in both directions. This protocol has broad client support across browsers, mobile applications, and server environments, making it accessible for diverse implementation scenarios.

When a connection is established, the agent can receive user input as a stream while simultaneously sending response chunks back to the user. The AgentCore Runtime manages the underlying infrastructure that handles connection, message ordering, and maintains conversational state across the bi-directional exchange. This alleviates the need for developers to build custom streaming infrastructure or manage the complexities of concurrent data flows.Voice conversations differ from text-based interactions in their expectation of natural flow. When speaking with a voice agent, users expect the same conversational dynamics they experience with humans: the ability to interrupt when they need to correct themselves, to interject clarification mid-response, or to redirect the conversation without awkward pauses.With bi-directional streaming, it’s possible for voice agents to process incoming audio while generating responses, detecting interruptions, and adjusting behavior in real-time. The agent maintains conversational context throughout these interactions, preserving the thread of dialogue even as the conversation shifts direction. This capability also helps voice agents from turn-based systems into a responsive conversational partner.

Beyond voice conversations, bi-directional streaming has several interaction patterns. Interactive debugging sessions allow developers to guide agents through problem-solving in real-time, providing feedback as the agent explores solutions. Collaborative agents can work alongside users on shared tasks, receiving continuous input as the work progresses rather than waiting for complete instructions. Multi-modal agents can process streaming video or sensor data while simultaneously providing analysis and recommendations. Async long-running agent operations can process tasks over minutes or hours while streaming incremental results to clients.

WebSocket implementation

To create a WebSocket implementation in AgentCore Runtime, you should follow a few patterns. Firstly, your containers must implement WebSocket endpoints on port 8080 at the /ws path, which aligns with standard WebSocket server practices. This WebSocket endpoint will enable a single agent container to serve both the traditional InvokeAgentRuntime API and the new InvokeAgentRuntimeWithWebsocketStream API. Additionally, customers must provide a /ping endpoint for health checks.

Bi-directional streaming using WebSockets on AgentCore Runtime supports applications using a WebSocket language library. The client must connect to the service endpoint with a WebSocket protocol connection:

wss://bedrock-agentcore..amazonaws.com/runtimes//ws

You also need to use one of the supported authentication methods (SigV4 headers, SigV4 pre-signed URL, or OAuth 2.0) and to make sure that the agent application implements the WebSocket service contract as specified in HTTP protocol contract.

Strands bi-directional agent: Simplified voice agent development

Amazon Nova Sonic unifies speech understanding and generation into a single model, delivering human-like conversational AI with low latency, leading accuracy, and strong price performance. Its integrated architecture provides expressive speech generation and real-time transcription in one model, dynamically adapting responses based on input speech prosody, pace, and timbre.

With bi-directional streaming now also available in AgentCore Runtime, you have several ways to show how to host a voice agent: one can be the direct implementation where you need to managing WebSocket connections, parsing protocol events, handling audio chunks, and orchestrating async tasks; another is the strands bi-directional agent implementation that abstracts this complexity and implements these steps on its own.

Example Implementation

In this post, you should refer to the Amazon Bedrock AgentCore bi-directional code, which implements bi-directional communication with Amazon Bedrock AgentCore. The repository has two implementations: One that uses native Amazon Nova Sonic Python implementation deployed directly to AgentCore Runtime, and a high-level framework implementation using the Strands bi-directional agent for simplified real-time audio conversations.

The following diagram shows the native Amazon Nova Sonic Python WebSocket server directly to AgentCore. It provides full control over the Nova Sonic protocol with direct event handling for complete visibility into session management, audio streaming, and response generation.

The Strands bi-directional agent framework for real-time audio conversations with Amazon Nova Sonic provides a high-level abstraction that simplifies bi-directional streaming, automatic session management, and tool integration. The code snippet below is an example of this simplification.

from strands.experimental.bidi.agent import BidiAgent
from strands.experimental.bidi.models.nova_sonic import BidiNovaSonicModel
from strands_tools import calculator
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket, model_name: str):
# Define a Nova Sonic BidiModel
model = BidiNovaSonicModel(
region="us-east-1",
model_id="amazon.nova-sonic-v1:0",
provider_config={
"audio": {
"input_sample_rate": 16000,
"output_sample_rate": 24000,
"voice": "matthew",
}
}
)
# Create a Strands Agent with tools and system prompt
agent = BidiAgent(
model=model,
tools=[calculator],
system_prompt="You are a helpful assistant with access to a calculator tool.",
)
# Start streaming conversation
await agent.run(inputs=[receive_and_convert], outputs=[websocket.send_json])

This implementation demonstrates the simplicity of Strands: instantiate a model, create an agent with tools and a system prompt, and run it with input/output streams. The framework handles protocol complexity internally.

The following is the agent declaration section in the code:

agent = BidiAgent(
    model=model,
    tools=[calculator, weather_api, database_query],
    system_prompt="You are a helpful assistant..."
)

Tools are passed directly to the agent’s constructor, and Strands handles function calling orchestration automatically. In summary, a native WebSocket implementation of the same functionality requires approximately 150 lines of code, whereas Strands implementation reduces this to approximately 20 lines focused on business logic. Developers can focus on defining agent behavior, integrating tools, and crafting system prompts rather than managing WebSocket connections, parsing events, handling audio chunks, or orchestrating async tasks. This makes bi-directional streaming accessible to developers without specialized real-time systems expertise while maintaining full access to the audio conversation capabilities of Nova Sonic. The Strands bi-directional feature is currently only supported for the Python SDK. If you are looking for flexibility in the implementation of your voice agent, the native Amazon Nova Sonic implementation can help you. Also, this can be important for the cases where you have multiple different patterns of communication from agent to model. With Amazon Nova Sonic implementation you will be able to control every step of the process with full control. The framework approach can provide better control of dependencies, because it is done by the SDK, and provides consistency across systems. The same Strands bi-directional agent code structure works with Nova Sonic, OpenAI Realtime API, and Google Gemini Live developers simply swap the model implementation while keeping the rest of their code unchanged.

Conclusion

The bi-directional streaming capability of Amazon Bedrock AgentCore Runtime transforms how developers can build conversational AI agents. By providing WebSocket-based real-time communication infrastructure, AgentCore removes months of engineering effort required to implement streaming systems from scratch. The framework runtime enables developers to deploy multiple types of voice agents—from native protocol implementations using Amazon Nova Sonic to high-level frameworks like the Strands bi-directional agent—within the same secure, serverless environment.

About the authors

Lana Zhang is a Senior Specialist Solutions Architect for Generative AI at AWS within the Worldwide Specialist Organization. She specializes in AI/ML, with a focus on use cases such as AI voice assistants and multimodal understanding. She works closely with customers across diverse industries, including media and entertainment, gaming, sports, advertising, financial services, and healthcare, to help them transform their business solutions through AI.

Phelipe Fabres is a Senior Specialist Solutions Architect for Generative AI at AWS for Startups. He specializes in AI/ML with a focus on Agentic systems and the full process of training/inference. He has more than 10 years of working with software development, from monolith to event-driven architectures with a Ph.D. in Graph Theory.

Evandro Franco is an Sr. Data Scientist working on Amazon Web Services. He is part of the Global GTM team that helps AWS customers overcome business challenges related to AI/ML on top of AWS, mainly on Amazon Bedrock AgentCore and Strands Agents. He has more than 18 years of experience working with technology, from software development, infrastructure, serverless, to machine learning. In his free time, Evandro enjoys playing with his son, mainly building some funny Lego bricks.