Home AI Tools and Trends

Veo 3.1 Adds Vertical Video as Google Sharpens AI Arms Race

January 13, 2026

Google rolled out substantial upgrades to Veo 3.1 today, bringing native vertical video generation and enhanced reference image capabilities to its AI video platform. The update, now available in the Gemini app for Google AI Plus, Pro, and Ultra subscribers, addresses the exploding demand for mobile-first content while intensifying competition with OpenAI’s Sora 2 in the $2.56 billion AI video generation market projected for 2032.

The headline feature enables automatic 9:16 portrait-mode video creation—a format dominating platforms like YouTube Shorts, Instagram Reels, and TikTok. Upload a vertical image, select the Video chip in Gemini, and Veo 3.1 preserves the aspect ratio while generating 8-second clips with synchronized audio. The shift reflects market realities: video accounts for over 65% of global mobile internet traffic, according to the U.S. National Telecommunications and Information Administration.

Shorter Prompts, Stronger Outputs: The Reference Image Revolution

Google’s second major upgrade transforms how Veo 3.1 interprets reference images. The system now generates videos with richer character expressions, more dynamic camera movements, and improved consistency across objects and backgrounds—all while requiring less detailed text prompts. Google describes the enhancement as enabling users to blend disparate elements (characters, textures, stylized backgrounds) with “more finesse.”

The practical implications are significant for content creators: characters maintain visual consistency when settings change, scenes remain stable when subjects transform within them, and multi-shot sequences require less manual prompt engineering. This addresses a persistent pain point in AI video generation—maintaining narrative and visual coherence across multiple generations without extensive trial-and-error.

Feature	Veo 3.1	Sora 2 (OpenAI)
Native vertical video	9:16 portrait mode	Not explicitly supported
Clip length	8 seconds per generation	12-25 seconds (varies by mode)
Resolution	1080p, 4K upscaling	1080p standard
Audio generation	Synchronized native audio	Synchronized dialogue & effects
Reference image support	Up to 3 reference images	Image input supported
Scene extension	Up to 148 seconds via chaining	Storyboard mode (limited)
Access	Gemini app, API, YouTube Shorts	Invite-only app, ChatGPT integration

Platform Integration: From Gemini to YouTube Shorts

Google’s distribution strategy leverages its existing ecosystem. Veo 3.1’s vertical video capabilities now extend to YouTube Shorts and the YouTube Create app, enabling creators to generate short-form content directly within platforms they already use. Professional users gain access through Flow (Google’s video editor), the Gemini API, Vertex AI, and Google Vids, with 1080p and 4K upscaling available across enterprise tools.

This integrated approach contrasts with Sora 2’s standalone app strategy. While OpenAI’s model emphasizes ultra-realistic motion physics and cinematic quality through a dedicated iOS application, Veo 3.1 prioritizes workflow embedding—meeting creators where they already work rather than requiring platform shifts.

The Competitive Landscape: Veo 3.1 vs Sora 2

The AI video generation market has entered a maturity phase characterized by rapid feature differentiation. Veo 3.1 focuses on controllable continuity, native vertical formats, and comprehensive editing toolsets including scene extension and object removal. Sora 2 emphasizes photorealistic motion, physics-accurate object interactions, and tighter audio synchronization, particularly for dialogue.

In blind testing comparisons, industry evaluators note distinct philosophies: Veo 3.1 delivers more predictable results when bridging multiple shots, while Sora 2 excels at naturally flowing physics in standalone clips. For prompt adherence, Veo 3.1 often generates more visually interesting compositions, whereas Sora 2 follows literal prompt instructions with greater precision.

The audio generation capabilities differ subtly but meaningfully. Veo 3.1 produces richer ambient soundscapes and more natural-sounding dialogue, particularly effective for narrative-driven content. Sora 2 excels at tightly synchronized sound effects aligned frame-by-frame with on-screen action, optimizing for social media virality and quick-cut editing styles.

Market Context: The $2.5 Billion AI Video Race

Today’s update arrives as the AI video generation market experiences explosive growth. Market research firms project the sector will reach between $1.1 billion and $2.98 billion by 2032-2033, depending on methodology, with compound annual growth rates ranging from 19.9% to 34.7%. Allied Market Research estimates $9.3 billion by 2033, driven by marketing, education, and social media adoption.

The competitive dynamics reflect broader AI industry trends. North America dominates current adoption at 34% market share, led by robust tech infrastructure and early enterprise deployment. Asia-Pacific exhibits the highest growth rates—projected to expand fastest through 2033—fueled by rapid digitalization, mobile-first consumption patterns, and government support for AI innovation.

Key players beyond Google and OpenAI include Synthesia, HeyGen, Runway ML, and Adobe. The market has consolidated around several distinct segments: text-to-video platforms (45% market share), professional enterprise solutions, and consumer-facing creative tools. Each segment addresses different willingness-to-pay thresholds and technical sophistication levels.

Technical Capabilities and Limitations

Veo 3.1’s scene extension feature enables creators to build sequences lasting over 148 seconds by chaining multiple 8-second clips, with each new generation based on the final second of the previous output. This maintains visual continuity but requires careful prompt management to preserve narrative coherence. The system supports upscaling to 1080p and 4K resolution through Flow, the Gemini API, and Vertex AI, though not yet in the consumer Gemini app.

All generated videos include imperceptible SynthID watermarks—Google’s digital provenance system enabling verification of AI-generated content. In December 2025, Google expanded verification capabilities to let users upload videos and confirm whether they were created with Google AI tools, addressing deepfake and authenticity concerns.

Critical limitations remain. Generation times average 1-2 minutes per 8-second clip, during which users cannot interact with the same chat session. Google AI Pro subscribers face daily generation limits, with notifications appearing as users approach thresholds. Ultra subscribers receive higher quotas but still encounter caps designed to manage compute costs and server load.

Use Cases: Where Vertical Video Changes the Game

The vertical video format unlocks specific content creation workflows previously requiring post-production cropping. Product demonstrations for e-commerce can now generate 9:16 content directly from still images, preserving visual fidelity without quality loss. Educational content creators producing mobile-first tutorials benefit from native vertical framing that matches learner viewing habits.

Social media managers gain efficiency by eliminating manual reformatting. Generate landscape content for YouTube, then create vertical variants for Shorts, Instagram Reels, and TikTok from the same reference images—each optimized for platform-specific aspect ratios. Early adopters report 5-10x increases in video output volume compared to traditional production methods, aligning with broader AI-assisted content creation trends showing 342% year-over-year adoption increases.

The “ingredients to video” feature—Google’s term for reference image-based generation—enables brand consistency across campaigns. Upload product shots, character designs, or style references, then generate multiple video variations maintaining visual coherence. This addresses a common challenge where AI video tools produce inconsistent results requiring extensive regeneration cycles.

Pricing and Accessibility

Veo 3.1’s vertical video and enhanced reference capabilities are available immediately to Google AI Plus ($19.99/month), Pro ($19.99/month), and Ultra ($199.99/month) subscribers. The tiered pricing reflects compute intensity—Pro subscribers receive moderate daily limits, while Ultra unlocks higher generation quotas suitable for professional workflows.

API access through Google AI Studio and Vertex AI uses per-second pricing estimated at $0.40/second for Veo 3.1 and $0.15/second for Veo 3.1 Fast (a lower-quality, faster variant). OpenAI has not published comparable Sora 2 API pricing, keeping the model primarily in invite-only app access tied to ChatGPT subscriptions.

Geographic restrictions apply: video generation from photos remains unavailable in the European Economic Area, Switzerland, and the United Kingdom due to regulatory considerations. Users in these regions can access text-to-video generation but cannot leverage the reference image capabilities central to today’s update.

Looking Forward: The 2026 AI Video Trajectory

Industry analysts predict 2026 will bring real-time video generation (dream to video in seconds), multi-modal AI understanding unifying text, image, audio, and video, and autonomous AI video agents capable of executing entire production workflows from high-level objectives. The vertical video update positions Google to capitalize on these trends by establishing ecosystem integration before competitors.

The market’s evolution mirrors broader AI development patterns: initial focus on technical capability (can it generate video?), followed by quality improvements (is it believable?), now entering workflow integration (does it fit how people actually work?). Google’s vertical video support and simplified prompting represent the third phase—making AI video generation practical for mainstream adoption rather than experimental novelty.

Remaining challenges include copyright clarity for generated content, ethical concerns around deepfakes despite watermarking, and regulatory uncertainty varying by jurisdiction. The AI video generation market faces growing scrutiny over authenticity labeling, particularly for political content and synthetic spokespersons. Governments are developing frameworks to govern synthetic content creation and distribution, potentially slowing adoption in heavily regulated sectors.

For creators evaluating Veo 3.1 versus Sora 2: choose Veo 3.1 for narrative-driven content requiring multi-shot consistency, mobile-first vertical formats, and integration with existing Google workflows. Choose Sora 2 for ultra-realistic motion physics, short-form viral content, and cinematic single-shot compositions. The market increasingly accommodates both approaches, with platforms like Media.io and Higgsfield offering side-by-side testing of multiple AI video models.

As the AI video generation market approaches its projected $2.5-9.3 billion valuation by 2033, vertical video support and simplified prompting represent Google’s bet that convenience and ecosystem integration will matter more than marginal quality differences—a strategy mirroring its historical approach to consumer AI adoption.

Follow us on Bluesky, LinkedIn, and X to Get Instant Updates

Shorter Prompts, Stronger Outputs: The Reference Image Revolution

Platform Integration: From Gemini to YouTube Shorts

The Competitive Landscape: Veo 3.1 vs Sora 2

Market Context: The $2.5 Billion AI Video Race

Technical Capabilities and Limitations

Use Cases: Where Vertical Video Changes the Game

Pricing and Accessibility

Looking Forward: The 2026 AI Video Trajectory

RELATED ARTICLESMORE FROM AUTHOR

Google Docs Adds New Gemini AI Accessibility Features

How to Build Your First App with Gemini 3 Flash

Apple partners with Google for multiyear AI integration

Join the conversation

RELATED ARTICLES MORE FROM AUTHOR