AWS Unveils New Bedrock and SageMaker Features

Amazon Web Services has shipped a batch of updates across its cloud platform, boosting core infrastructure services while zeroing in on generative AI workflows in Amazon Bedrock and SageMaker. The releases target two pain points developers have been vocal about: the cost of running complex AI agents and the fragmented tooling around ML development.

Bedrock Updates at a Glance

Feature What It Does Status
Server-Side Tool Use Manages tools on the server for multi-turn agent workflows, reducing latency and cost Available now (select LLMs including OpenAI models)
Prompt Caching (1-hour TTL) Caches prompts to serve repeated queries instantly without reprocessing Generally available for select Claude models

Server-Side Tool Use in Bedrock

The Responses API in Amazon Bedrock now supports server-side tool use, a capability designed to streamline long-running, multi-turn agent workflows. Instead of ping-ponging tool calls between client and server, the tools are managed entirely on the server side. According to AWS, this improves performance and lowers costs for complex agentic applications. The feature is initially available with select LLMs, including models from OpenAI.

Prompt Caching Cuts Latency and Cost

A 1-hour prompt caching Time-To-Live (TTL) is now generally available for select Claude models in Bedrock. For applications handling repetitive or templated queries, this is a meaningful cost reduction. The cached prompt is served instantly on subsequent requests without reprocessing, cutting both latency and inference spend in one move.

SageMaker Gets a Unified Studio

Amazon SageMaker has launched its Unified Studio, consolidating the scattered tooling around ML development into a single interface. Building, training, and deploying models previously required jumping between multiple environments. Unified Studio brings the entire ML lifecycle into one workspace, a move aimed at giving developers more unified tools for machine learning without the context-switching overhead.

Infrastructure Updates Running Alongside

The AI-focused releases didn’t ship in isolation. AWS simultaneously upgraded several core services that underpin increasingly data-intensive workloads:

  • Amazon EventBridge: Maximum event payload size increased from 256 KB to 1 MB, allowing richer event data including complex JSON, telemetry, and AI outputs without splitting payloads or relying on external storage. Details in the official announcement.
  • AWS Network Firewall: Now includes generative AI traffic visibility through web category filtering, giving administrators granular control over how AI applications are used within their networks.
  • Amazon Keyspaces: Table pre-warming has been introduced for predictable performance on high-throughput workloads running on the managed Cassandra-compatible database.
  • AWS Verified Access: New capabilities for building and enforcing zero-trust access policies across complex, multi-account AWS environments.

What This Signals

Taken together, the updates reflect AWS’s strategy to make generative AI development more performant and cost-effective at scale. Server-side tool use and prompt caching directly address the economics of running AI agents in production. Unified Studio targets the developer experience friction that slows ML iteration. And the infrastructure upgrades to EventBridge, Network Firewall, and Keyspaces ensure the platform can handle the heavier, more security-sensitive workloads these AI features generate.

Follow us on Bluesky , LinkedIn , and X to Get Instant Updates