How a Moonshot Led to Google DeepMind’s Veo 3

Post Content

Dumi Erhan, co-lead of the Veo project at Google DeepMind, joins host Logan Kilpatrick for a deep dive into the evolution of generative video models. They discuss the journey from early research in 2018 to the launch of state-of-the-art Veo 3 model with native audio generation. Learn about the technical hurdles in evaluating and scaling video models, the challenges of long-duration video coherence and how user feedback is shaping the future of AI-powered video creation.

Chapters:
0:00 – Intro
0:37 – Veo project’s beginnings
1:34 – Veo’s origins in Google Brain
2:58 – Video prediction and robotics applications
5:44 – Early progress and evaluation challenges
8:00 – How to evaluate video models
10:19 – Physics-based evaluations and their limitations
12:14 – The launch of the original Veo model
14:04 – Scaling challenges for video models
16:13 – The leap from Veo1 to Veo2
17:42 – The role of audio in Veo3 and early challenges
19:37 – Veo 3’s viral audio moment
21:14 – User trends shaping Veo’s roadmap
23:47 – Image-to-video vs. text-to-video complexity
25:57 – New prompting methods and user control
27:54 – Coherence in long video generation
33:26 – Genie 3 and world models
36:19 – The steerability challenge
42:47 – How video understanding aids generation
45:56 – Capability transfer and image data’s role
48:15 – Future development and user feedback