VideoGemini OmniGoogleDeepMind

Gemini Omni — Google's New Video Model, Explained (and Coming Soon to GenFire)

Google's unannounced 'Gemini Omni' video model leaked inside the Gemini app days before I/O 2026. Here's what it does, how it stacks up against Seedance 2.0 and Veo, and when you'll be able to use it on GenFire.

GenFire Team

·May 12, 2026·8 min read

A Big Leak, Eight Days Before I/O

On May 11, 2026 — eight days before Google I/O — a new model card showed up inside the Gemini mobile app. The description read:

"Create with Gemini Omni: meet our new video model. Remix your videos, edit directly in chat, try a template, and more."

That single string is the first public glimpse of Gemini Omni, an unannounced video model that's already producing demo clips good enough to set off a week of speculation across the AI press. It sits in the Gemini app right next to Toucan, the internal codename for Google's current Veo 3.1-powered video tab — which is exactly the staging pattern Google uses before swapping an old model out for a new one.

Here's what's actually known, what's worth speculating about, and what it means for creators using GenFire.

What Gemini Omni Appears to Be

The leaked UI strings, model cards, and early sample outputs paint a consistent picture, even if Google hasn't said a word officially yet.

A Unified Multimodal Video Model

The most ambitious — and most consistent with the "Omni" name — interpretation is that this isn't just a new text-to-video model. It's a single Gemini model that natively handles text, images, and video (and possibly audio) inside one system.

That would make it the first true omni-model with video output from a major lab. Today's state-of-the-art video models — Veo 3.1, Seedance 2.0, Kling 3, Sora 2 — are all specialized video generators. They don't also handle image creation or reasoning natively. Omni's pitch, if the leaks are right, is that all of those modalities collapse into one model that you talk to in chat.

Chat-Based Video Editing

This is the part that has reviewers most excited. The leaked demos show Omni doing things that, until now, you'd open a dedicated editor for:

Watermark removal
Object swap inside an existing clip ("change the red car to a blue one")
Scene rewrite via natural-language chat instructions
Templates and remix workflows for fast iteration

Multiple early reviewers flagged the same thing: Omni's editing quality outpaces its raw generation quality. That's a strong signal it's a genuinely different architecture than Veo — a unified model that treats video as another editable modality rather than a one-shot render pipeline.

Flash and Pro Tiers

Metadata in the leaked UI suggests Omni ships in Flash and Pro variants — the same tier structure Google uses for Nano Banana, its image model. That's not how Google brands pure renames, which is one of the strongest pieces of evidence that Omni is a genuinely new model rather than a rebrand of Veo 4.

Realistic Cost Signals

One user posting screenshots burned 86% of their Gemini Pro daily quota on just two Omni prompts — implying a per-generation cost meaningfully higher than current Veo flows. This is a heavy model. Expect a metered credit system at launch, with Pro priced as a premium tier above Flash.

What the Leaked Demos Actually Look Like

Two public samples have circulated so far:

1. Chalkboard math proof. Prompt: a professor writes out a trigonometric identity on a traditional chalkboard, explaining each step. Result: legible on-screen text and equations — historically a weak spot for every other major video model. Some "obvious tells" remain, but the text-rendering quality is a clear step up.

2. Will-Smith-spaghetti-style restaurant scene. Two men dining at an upscale seaside restaurant. Result: convincingly realistic at first glance, with good lighting, plausible motion, and surprisingly stable hands.

On pure cinematic fidelity, Omni still trails Seedance 2.0, which currently leads the public benchmarks. But on text rendering, prompt adherence, and editing, the early read is that Omni is already competitive or ahead.

How It Stacks Up

Capability	Gemini Omni (leaked)	Seedance 2.0	Veo 3.1	Sora 2
Text-to-Video	✅	✅	✅	✅
Image-to-Video	✅ (expected)	✅	✅	✅
Native Audio	✅ (rumored, multi-track)	✅	✅	❌
Chat-Based Editing	✅ (standout feature)	❌	❌	❌
Watermark / Object Removal	✅	❌	❌	❌
Unified Image + Video Model	✅ (if rumors are right)	❌	❌	❌
Tiered Flash / Pro	✅	✅	✅ (Lite/Standard)	❌
Cinematic Fidelity	Competitive	Best in class	Strong	Strong
On-Screen Text Quality	Strong	Moderate	Moderate	Moderate

The pattern here is pretty clear: Omni is not trying to beat Seedance 2.0 at being Seedance 2.0. It's trying to be a different shape of product — one where generation, editing, and remixing all live in the same conversation with the same model.

When to Expect It

May 19–20, 2026: Google I/O. Google has used I/O for major Veo announcements twice before (Veo 1 in 2024, Veo 3 in 2025). The leak surfacing eight days out is almost certainly intentional staging.
Public access: Expect a phased rollout. Gemini Pro/Ultra subscribers first, then broader Gemini consumers, then Vertex AI for developers and enterprises.
API availability: Leaked hints suggest Omni may launch as an Agent-style API (similar to Deep Research) rather than a plain generation endpoint — which fits the "edit-in-chat" framing.

Why This Matters for GenFire

GenFire is built on a simple bet: creators shouldn't have to pick one model. Different shots, different styles, different briefs call for different tools. That's why GenFire's Video Studio and Storyboard already route between Seedance 2.0, Veo 3.1, Kling V3, Sora 2, Happy Horse, and more — depending on what each shot needs.

Gemini Omni slots into that lineup naturally, and here's what GenFire users can expect:

Omni Will Land in GenFire Shortly After Public Release

As soon as Google ships Gemini Omni publicly, GenFire's integration pipeline will pick it up. Our model dispatch layer is designed exactly for this: a new model becomes a new entry in the dispatcher, a new card in the Video Studio, and a new option in the Storyboard director — usually within days of public availability.

Chat-Based Editing Fits GenFire's Existing Workflow

GenFire already has a conversational generation surface. Omni's "edit directly in chat" pattern — describe a change, get a new version, iterate — maps cleanly onto the gallery and remix flows GenFire users already know. When Omni ships, you'll be able to:

Generate a clip with Seedance 2.0 or Veo for raw cinematic quality
Send it to Omni for a quick edit — remove an object, swap a background, change a color
Loop the result back into a storyboard, lip-sync flow, or dubbing pipeline

No model lock-in. No re-uploading. Each model used for what it's best at.

Storyboard Integration

GenFire's AI director will route shots to Omni when the scene calls for on-screen text, chat-based revisions, or edit-heavy work — and keep routing cinematic establishing shots to Seedance 2.0 or Veo. That's the whole point of multi-model orchestration: every shot gets the best available model for that shot.

Workflow Editor Support

Omni will be exposed as a node in GenFire's node-based workflow editor, the same way every other model is. Build a pipeline that takes a product photo, runs it through Omni for a chat-edited promo clip, layers in ControlFoley sound design, and exports for TikTok — all in one graph.

What We're Watching at I/O

Five questions Google needs to answer on May 19:

1Is Omni a new model, a rebrand of Veo 4, or a unified multimodal Gemini variant?
2Are the chat-based editing capabilities as good in production as they are in demos?
3What's the Flash vs. Pro pricing structure?
4Will it ship on Vertex AI (and how fast)?
5How does it benchmark against Seedance 2.0 on motion, text rendering, and audio sync?

We'll update this post — and ship the integration — within days of the announcement.

The Bottom Line

Gemini Omni looks like the first serious attempt to collapse video generation and video editing into one model you talk to in plain language. That's a genuinely different product shape than what Seedance, Veo, Sora, or Kling offer today.

It probably won't beat Seedance 2.0 on raw cinematic fidelity at launch. It almost certainly will beat everything else on the "change one thing in this clip" workflow that creators do all day, every day.

When Google flips the switch at I/O, GenFire users won't have to wait — Omni will land alongside the other 20+ models already in the platform, ready to be slotted into whatever workflow makes sense for your project. One subscription, every model, all the time.

Create a free GenFire account to be ready the day Omni ships. Starter credits included, no credit card required.

Ready to try it yourself?

50+ AI creative tools, no credit card required.

Get Started Free

ENFIRE