Key Update Details
- Developer: OpenAI
- Feature: WebSocket-based execution mode for Responses API
- Purpose: Reduce latency and improve throughput in agentic workflows
- Status: Released in alpha after a two-month cycle to selected partners
- Impact: Up to 40% latency reduction in early production use
OpenAI Boosts AI Agent Speed with WebSocket Mode
The new WebSocket mode addresses a critical bottleneck in modern AI agent development. Agentic workflows, which involve multiple steps like tool calls, intermediate reasoning, and follow-up queries, previously suffered from repeated network round-trip times due to the stateless nature of HTTP. As AI inference speeds have improved, these network delays became a dominant source of latency and operational complexity.
By establishing a long-lived, bidirectional connection, the WebSocket-based execution mode allows for continuous data exchange without the overhead of repeated handshakes. This architectural change supports streaming responses and faster tool execution, streamlining the coordination of multi-step workflows. OpenAI noted that early production use has demonstrated up to a 40% latency reduction and enhanced throughput in high-concurrency scenarios.
Addressing the Latency Challenge in Multi-Step AI Workflows
This update reflects a broader industry focus on optimizing the transport layer in agentic systems, where communication patterns significantly influence overall performance. The approach aligns with event-driven design principles common in distributed systems, where maintaining state across interactions boosts responsiveness and throughput. Ofek Shaked, a Vibe Coder, praised the change, stating, WebSockets for agent state is such an obvious but huge win. No more cold starts killing your multi-tool chains.
OpenAI’s early production results highlight sustained throughput of approximately 1,000 transactions per second (TPS), with bursts reaching up to 4,000 TPS. Developer tooling and coding agent platforms have quickly adopted the new mode. Vercel, for instance, integrated the WebSocket mode into its AI SDK and reported a 40% latency reduction. Similarly, Cline observed a 39% improvement in multi-file workflows, while Cursor saw gains of up to 30%. These figures underscore the impact of system-level optimizations beyond model improvements. Gabriel Chua, a DX Engineer at OpenAI, confirmed the feature’s flexibility, noting, You can warm up the connection by sending your system prompt and tool definitions first. It’s Zero Data Retention (ZDR) compatible.
A Return to Stateful Connections for Next-Gen AI
From an implementation standpoint, developers can now replace numerous HTTP calls with a single persistent session, simplifying orchestration logic and reducing connection setup overhead. This also enhances support for streaming applications, such as incremental code generation and interactive reasoning. Kevin Cho, an engineer at Microsoft, commented that this approach signals Going back to the original software stack problems. websockets and stateful connections.
The shift introduces new design considerations, including connection lifecycle management and backpressure under high concurrency, aligning with established patterns for stateful distributed systems. OpenAI launched the feature in alpha, with partners like Codex already migrating most of their Responses API traffic to the WebSocket mode, signaling its readiness for broader adoption.
Follow Hashlytics on Bluesky, LinkedIn , Telegram and X to Get Instant Updates



