Home Cloud News and Updates

Google Cloud Run Adds NVIDIA Blackwell GPUs for Serverless AI

February 4, 2026

Google Cloud has expanded its serverless computing platform, Google Cloud Run, to include support for NVIDIA’s latest Blackwell architecture GPUs. According to company announcements, the integration allows developers to deploy and serve large-scale AI models with over 70 billion parameters without managing underlying server infrastructure, a significant step in simplifying generative AI application deployment.

In a social media post and corresponding company blog updates, Google Cloud officials confirmed the new capability. Richard Seroter, a Director of Outbound Product Management at Google, stated that Google Cloud Run now supports NVIDIA RTX 6000 PRO series GPUs based on the Blackwell architecture. This enhancement is designed to streamline high-throughput tasks like large language model (LLM) inference by providing on-demand, scalable GPU resources within a serverless framework.

The integration aims to remove the operational overhead typically associated with managing GPU-accelerated hardware. According to the announcement, the platform handles the complex setup, allowing developers to focus on their applications. The key specifications for these new serverless instances include:

GPU Architecture: NVIDIA Blackwell
Supported Models: Optimized for 70B+ parameter language models
CPU Options: 20 to 44 vCPUs
Memory Options: 80 to 176 GiB
Developer Features: Pre-installed drivers and no requirement for capacity reservations

The primary motivation for this update is to lower the barrier to entry for deploying sophisticated AI models. By abstracting away infrastructure management, Google Cloud Run aims to provide a more efficient and scalable solution for AI inference. The company claims this serverless approach enables teams to handle variable workloads and high-throughput demands without pre-provisioning or administering dedicated GPU clusters, directly addressing a common bottleneck in AI application development.

While the technical specifications have been outlined, several key details remain unannounced. The official pricing model for these new GPU-enabled instances has not been released. Furthermore, specific details on regional availability and performance benchmarks comparing these serverless instances against dedicated virtual machines are not yet public.

This update positions Google Cloud Run as a more competitive option for developers building and deploying generative AI applications. The availability of powerful, on-demand GPUs is expected to accelerate the development of serverless AI-powered services. Users can anticipate further documentation and tutorials from Google on how to best leverage these new instances for LLM inference and other demanding computational tasks.

For organizations and developers interested in this new capability, the immediate steps involve reviewing official documentation and assessing current workflows.

Review the official Google Cloud Run documentation for detailed guides on deploying GPU-accelerated services.
Evaluate existing AI and machine learning workloads to identify potential candidates for migration to a serverless model.
Monitor the official Google Cloud blog for forthcoming announcements regarding pricing, regional rollouts, and performance data.

Follow us on Bluesky , LinkedIn , and X to Get Instant Updates

RELATED ARTICLESMORE FROM AUTHOR

NVIDIA Brings GPU Computing to Legacy BASIC Code

Circle Partners With Nvidia on Stablecoin Technology

Singtel and Nvidia Launch AI Centre for Enterprise

Join the conversation

RELATED ARTICLES MORE FROM AUTHOR