Gemini 3.5 Flash Offers Speed, Costs 3x More Than Rivals

Google has unveiled Gemini 3.5 Flash, a new AI model that significantly outperforms its Gemini 3.1 Pro predecessor and runs four times faster than rival frontier models. The speed and performance come with a notable caveat: the new Flash model is priced three times higher than the previous Gemini 3 Flash, signaling a strategic shift by Google to position Flash models as premium offerings rather than budget alternatives.

Announced on , at Google I/O 2026, Gemini 3.5 Flash is now the default AI model across the Gemini app, AI Mode in Google Search, and the newly launched Gemini Spark personal agent. This multimodal AI model accepts text, images, audio, and video input, generating text output. It boasts a 1 million token context window and supports up to 65,536 output tokens.

According to Google’s official blog, the model is purpose-built for agentic workflows and complex coding tasks, aiming to handle multi-step projects with tangible results.

Key Specifications

Specification Detail
Release Date
API Model ID gemini-3.5-flash
Context Window 1,048,576 input tokens
Max Output 65,536 tokens
Modalities Text, Image, Audio, Video Input; Text Output
Thinking Levels Minimal, Low, Medium (Default), High
Input Pricing $1.50 per 1M tokens
Output Pricing $9.00 per 1M tokens
Speed 284 tokens/second

Performance Benchmarks

Gemini 3.5 Flash not only improves upon its predecessor but also surpasses Google’s own Gemini 3.1 Pro model in various benchmarks. On the Artificial Analysis Intelligence Index, Gemini 3.5 Flash scores 55, outperforming Claude Sonnet 4.6 (52) and Grok 4.3 (53). The model achieved significant gains in agentic evaluations and reduced its hallucination rate by 31 points compared to Gemini 3 Flash.

The model’s multimodal capabilities are particularly noteworthy, scoring 84% on MMMU-Pro. Unlike some competitors that support only image input alongside text, Gemini 3.5 Flash natively processes images, video, speech, and text concurrently. Google CEO Sundar Pichai stated at the I/O keynote that the model generates output at 284 tokens per second, making it four times faster than other frontier models.

How Gemini 3.5 Flash Compares to Rivals

Model Intelligence Index Speed (tok/s) Input $/1M Output $/1M
GPT-5.5 (xhigh) 60 ~140 $5.00 $30.00
Claude Opus 4.7 (max) 57 ~120 $5.00 $25.00
Gemini 3.5 Flash (high) 55 284 $1.50 $9.00
Grok 4.3 (high) 53 ~150 $1.25 $2.50
Claude Sonnet 4.6 (max) 52 ~110 $3.00 $15.00

The Pricing Jump

A significant change for developers and enterprises is the new pricing structure. Gemini 3 Flash was priced at $0.50 per million input tokens and $3.00 per million output tokens. The new Gemini 3.5 Flash costs $1.50 per million input tokens and $9.00 per million output tokens, representing a 3x price increase across the board.

While a 90% discount on cached input tokens ($0.15 per million) is available for repetitive workloads, the overall cost for comprehensive evaluations has reportedly increased by 5.5x. This adjustment indicates Google’s view of Flash-tier models as premium products rather than budget alternatives.

Google’s Agentic AI Ecosystem

The launch of Gemini 3.5 Flash underscores Google’s pivot from conversational AI to autonomous AI agents. This strategic shift aims for models that can autonomously plan, build, and execute real-world tasks. During internal tests, Gemini 3.5 Flash agents reportedly built an operating system from scratch, deploying 93 sub-agents and generating 2.6 billion tokens within approximately 12 hours.

Three new products support this agentic strategy:

Antigravity 2.0: Google’s agent-first development platform, now available as a standalone desktop application with a CLI and SDK. DeepMind’s chief technologist Koray Kavukcuoglu noted its co-development with Flash 3.5.

Gemini Spark: A 24/7 personal AI agent powered by Gemini 3.5 Flash, running on Google Cloud virtual machines. It can manage tasks across Workspace tools like Gmail and Docs, even when user devices are off. Beta access begins next week for Google AI Ultra subscribers in the US.

Managed Agents in the Gemini API: Allows developers to deploy agent workflows directly through the API, with native support for the Model Context Protocol (MCP).

This integrated approach gives Google a potential advantage, as Spark offers native access to the entire Workspace suite without additional setup, unlike rival agents that often require separate configuration.

Broader Context from Google I/O 2026

Gemini 3.5 Flash was part of a broader set of announcements at I/O 2026, highlighting Google’s expanding AI footprint. Other revelations included Gemini Omni, a new multimodal world model capable of generating video from various inputs, and conversational editing capabilities.

Google also reported substantial growth, with 900 million monthly active users on the Gemini app and over 1 billion monthly active users on AI Mode in Google Search. The company now processes 3.2 quadrillion tokens per month across its platforms, marking a sevenfold increase year-over-year.

What Developers Should Consider

For developers, Gemini 3.5 Flash offers near-Pro reasoning at Flash-tier speed, reducing the need to compromise between quality and latency for many tasks. However, the higher pricing necessitates careful cost management, particularly for high-volume agentic workloads where token usage can quickly accumulate.

Enterprises can leverage Flash intelligence, Antigravity 2.0, and Managed Agents to automate complex, multi-step workflows. Partner banks and fintechs are reportedly already using Flash-powered agents to automate multi-week processes. The evolution of AI, including competitive offerings like those in the China’s Claude API grey market, will continue to shape how these models are adopted.

Follow Hashlytics on Bluesky, LinkedIn , Telegram and X to Get Instant Updates