VideoGemini OmniGoogleDeepMind

Gemini Omni Is Here: Google's New AI Video Model Goes Official at I/O 2026

Gemini Omni Flash launches today: chat-based video editing, physics-accurate generation, multimodal inputs (image, audio, video, text), and digital avatars. Full breakdown plus when it lands on GenFire.

GenFire Team

·May 20, 2026·10 min read

Google Just Shipped the Most Ambitious Video Model of 2026

Eight days after the leak, Google made it official. At I/O 2026, Koray Kavukcuoglu — CTO of Google DeepMind and Chief AI Architect at Google — introduced Gemini Omni, the first model in a new family that, in his words, "can create anything from any input — starting with video."

The first variant out the door is Gemini Omni Flash — the most ambitious attempt yet to collapse video generation, video editing, and multimodal reasoning into a single model you talk to in plain language.

Here's what shipped, what makes it different from Veo, Seedance 2.0, and Sora 2 — and what it means for creators using GenFire.

Omni montage — the official launch sizzle reel from Google DeepMind.

What Gemini Omni Actually Is

Omni is not just a new video generator. It's the first model in a planned multimodal generation family from Google DeepMind — one model that natively accepts images, audio, video, and text as inputs, and produces video as its first output modality. Google has confirmed that image and audio outputs are coming next under the same Omni umbrella.

What makes it materially different from Veo, Seedance, Sora, or Kling:

Chat-based editing across multiple turns. Every instruction builds on the last one. Characters stay consistent, physics holds up, and the scene "remembers" what came before.
Multimodal references. Combine image + audio + video + text in a single prompt. The first release supports voice references for audio, with more audio input types rolling out soon.
Real-world grounding. Omni draws on Gemini's general world knowledge of history, science, physics, and culture — not just visual pattern matching.
SynthID watermarking by default. Every clip carries an invisible content-provenance watermark.

Edit Your Videos Through Conversation

This is the headline feature — and the one that genuinely separates Omni from every other video model on the market. You start with a clip (generated or uploaded), then iterate by talking to it. Each turn is a new edit; the prior context stays intact.

Transform the world around you

Change one thing, or change everything. Your video becomes the starting point for something you couldn't have filmed.

Prompt: "Make the sculpture out of bubbles." Omni rebuilds the entire form from foam while preserving lighting and motion.

Reimagine the action

Take a clip you already shot and ask Omni to rewrite what happens inside it.

Prompt: "When the person touches the mirror, make the mirror ripple beautifully like liquid, and the person's arm turns into reflective mirror material."

Prompt: "Dim the lights in the room. Put a black and white checkerboard room inside a glass sphere that floats above the hand, creating an infinite recursive of rooms. Camera slowly gets closer into the sphere, creating a video loop."

Prompt: "The lights of the apartments start turning on in sync with the music."

Refine across multiple turns

This is the part to actually pay attention to. The same scene, edited four times in a row, never loses the thread:

Turn 1 — Prompt: "A video of a violinist playing a song."

Turn 2 — Prompt: "Transport the violinist to the image environment." Same character, same performance, new world.

Turn 3 — Prompt: "Make the violin invisible." Notice how the hand position and motion stay locked to the music.

Turn 4 — Prompt: "Change the camera angle to be over the violinist's shoulder." Now the previous three edits all hold from a brand-new viewpoint.

This is the single best demo of true multi-turn video editing we've ever seen from a major lab. Until now, every "edit" with Veo, Sora, or Kling has meant regenerating from scratch and hoping for consistency. Omni is genuinely editing.

Bring Ideas to Life, Grounded in World Knowledge

Omni doesn't just render — it reasons. According to Google, the model has an improved intuitive grasp of gravity, kinetic energy, and fluid dynamics, plus access to Gemini's knowledge of history, science, and cultural context.

Physics that actually behaves

Prompt: "A marble rolling fast on a chain reaction style track, continuous smooth shot." Watch the momentum carry through the track transitions.

Complex concepts as visuals

Prompt: "Claymation explainer of protein folding, everything is made out of clay, no hands, stop motion, accurate." Omni produces a viewable educational sequence from a one-line brief.

Create Videos From Any Combination of Inputs

The "Omni" in the name is literal. You can reference images, text, video, and audio in a single prompt. At launch, audio is limited to voice references; broader audio support is coming.

Sci-fi style transfer combining a reference image and a music track. Prompt: "Dynamic sci-fi film style video based on image_0.png. Elements light up similar to video_0.mp4 synchronized to the beat of the music from audio_0.wav."

Multi-reference style-shift walk cycle. The character comes from an image, the camera motion from a video reference, the style transitions from audio.

Prompt: "Add harp sounds synchronized to when I touch each fern leaf. Change the leaf structure to all resemble semi-translucent 3D bioluminescent plant life, with bioluminescent fireflies flying around it that react as I play, in sync with the sounds."

Reference-driven world change. Prompt: "Imagine the world gradually changing into retro-futuristic style as I walk. Use the audio for a retro-futuristic background music."

Drawing-to-realism. Prompt: "Turn this into realistic footage, using the drawing only as a guide for movement, do not show the drawing in the final video."

Pose + style transfer across three references. Prompt: "Apply the pose and motion from input video to provided character from this image. Apply style from image reference to the new video."

Motion-effects edit. Prompt: "Edit this keeping everything the same. Add animated motion effects coming out of the skateboard."

Motion abstraction from a video reference. Prompt: "Apply the motion of the whale swimming from the provided video to the provided image of fluid reflective material. Do not show the whale or water; instead, have this reflective moving material form a shape that resembles the whale as it swims."

Avatars: Your Voice, Your Face, Your Videos

Omni ships with Avatars on day one — a way to generate videos that look and sound like you, using your own voice as a reference. Google flagged this clearly as a responsibility-first feature: voice and likeness are user-controlled, and Google is intentionally holding back broader audio-editing capabilities for now while they study the misuse surface.

Every Omni output also carries an invisible SynthID watermark for content provenance — a meaningful step, and a notable contrast with several competitors that don't watermark at all.

How Omni Compares to Seedance 2.0, Veo 3.1, and Sora 2

Capability	Gemini Omni Flash	Seedance 2.0	Veo 3.1	Sora 2
Text-to-Video	✅	✅	✅	✅
Image-to-Video	✅	✅	✅	✅
Audio Input (voice)	✅	❌	❌	❌
Audio Input (music/SFX)	Coming soon	❌	❌	❌
Native Audio Output	Coming soon	✅	✅	❌
Chat-Based Multi-Turn Editing	✅	❌	❌	❌
Multi-Reference Composition	✅	Partial	Partial	Partial
Digital Avatars	✅	❌	❌	❌
World-Knowledge Grounding	✅ (Gemini)	❌	Limited	❌
Cinematic Fidelity	Strong	Best in class	Strong	Strong
On-Screen Text Quality	Strong	Moderate	Moderate	Moderate
Content Provenance	SynthID (default)	❌	SynthID	C2PA

The pattern is clear. Seedance 2.0 still owns raw cinematic fidelity. Omni doesn't try to beat it there. What Omni does is change the shape of the product — generation, iteration, and editing all happen in one chat thread with one model. That's a different workflow than anything else on the market.

What This Means for GenFire

GenFire is built on a simple idea: creators shouldn't have to pick one model. Different shots, different briefs, different budgets call for different tools. That's why GenFire's Video Studio and Storyboard already route between Seedance 2.0, Veo 3.1, Kling V3, Sora 2, Happy Horse, and 15+ other models.

Here's how Omni slots in:

Omni lands in GenFire the moment it's available

The moment Omni becomes available to integrate, GenFire's model dispatch layer picks it up — new entry in the dispatcher, new card in Video Studio, new option in Storyboard director. Historically we ship integrations within 48 hours of availability.

Chat-based editing meets GenFire's gallery

Omni's "edit in chat" pattern maps cleanly onto how creators already work in GenFire. Once integrated, you'll be able to:

1Generate a cinematic establishing shot with Seedance 2.0 (still best-in-class for fidelity).
2Send a frame to Omni for chat-based revisions — swap a character's outfit, change the time of day, add an effect.
3Loop the result into a Storyboard, lip-sync flow, or ControlFoley sound design pipeline.

No re-uploading. No re-prompting from scratch. Each model used for what it's actually best at.

Multi-turn editing inside the workflow editor

Omni will be exposed as a node in GenFire's node-based workflow editor — same as every other model. Build a graph that takes a product photo → runs it through Omni for a chat-edited promo clip → layers in ControlFoley sound design → exports for TikTok. All in one pipeline.

Avatars play nicely with GenFire's influencer system

GenFire already supports custom influencers with reference images and voice cloning. Omni's Avatar feature is complementary, not competitive — pick Omni when you want Google's avatar pipeline with your voice, or stay in GenFire's influencer pipeline when you need full character control across cinematic shots.

The Bottom Line

Gemini Omni is the first model that takes video editing as seriously as video generation. The multi-turn violin demo alone — same character, same performance, four sequential edits that all hold — is something no other public video model can do today.

It probably won't beat Seedance 2.0 on raw photorealism at launch. It almost certainly will beat everything else on the "change one thing in this clip" workflow that creators do all day, every day.

Omni joins the 20+ models already live in GenFire as soon as it's available to integrate. One subscription, every model, all the time — pick the right tool for each shot, instead of being locked into whoever ships first.

Create a free GenFire account to be ready when Omni lands. Starter credits included, no credit card required.

All video clips in this post are © Google / Google DeepMind, originally published with the Gemini Omni announcement on May 19, 2026. Embedded here for editorial commentary.

Ready to try it yourself?

50+ AI creative tools, no credit card required.

Get Started Free

ENFIRE