In a social media post and corresponding company blog updates, Google Cloud officials confirmed the new capability. Richard Seroter, a Director of Outbound Product Management at Google, stated that Google Cloud Run now supports NVIDIA RTX 6000 PRO series GPUs based on the Blackwell architecture. This enhancement is designed to streamline high-throughput tasks like large language model (LLM) inference by providing on-demand, scalable GPU resources within a serverless framework.
The integration aims to remove the operational overhead typically associated with managing GPU-accelerated hardware. According to the announcement, the platform handles the complex setup, allowing developers to focus on their applications. The key specifications for these new serverless instances include:
- GPU Architecture: NVIDIA Blackwell
- Supported Models: Optimized for 70B+ parameter language models
- CPU Options: 20 to 44 vCPUs
- Memory Options: 80 to 176 GiB
- Developer Features: Pre-installed drivers and no requirement for capacity reservations
The primary motivation for this update is to lower the barrier to entry for deploying sophisticated AI models. By abstracting away infrastructure management, Google Cloud Run aims to provide a more efficient and scalable solution for AI inference. The company claims this serverless approach enables teams to handle variable workloads and high-throughput demands without pre-provisioning or administering dedicated GPU clusters, directly addressing a common bottleneck in AI application development.
While the technical specifications have been outlined, several key details remain unannounced. The official pricing model for these new GPU-enabled instances has not been released. Furthermore, specific details on regional availability and performance benchmarks comparing these serverless instances against dedicated virtual machines are not yet public.
This update positions Google Cloud Run as a more competitive option for developers building and deploying generative AI applications. The availability of powerful, on-demand GPUs is expected to accelerate the development of serverless AI-powered services. Users can anticipate further documentation and tutorials from Google on how to best leverage these new instances for LLM inference and other demanding computational tasks.
For organizations and developers interested in this new capability, the immediate steps involve reviewing official documentation and assessing current workflows.
- Review the official Google Cloud Run documentation for detailed guides on deploying GPU-accelerated services.
- Evaluate existing AI and machine learning workloads to identify potential candidates for migration to a serverless model.
- Monitor the official Google Cloud blog for forthcoming announcements regarding pricing, regional rollouts, and performance data.
Follow us on Bluesky , LinkedIn , and X to Get Instant Updates



