Post Content
Shrestha Basu Mallick, one of the product leads for the Gemini API, joins host Logan Kilpatrick for a deep dive of Gemini Live API, Google’s real-time, multimodal interface for developers. Learn about how native audio alongside new capabilities like proactive audio and async function calling unlocks the unique power of audio as an interface.
0:00 – Intro
1:18 – Live API OVERVIEW
3:36 – Why audio is a special modality
5:07 – Speed vs. precision in audio
6:17 – Controllable and promptable TTS
8:31 – What developers are building with the Live API
11:14 – URL context and async calling features
15:02 – Proactive audio and affective dialog
16:55 – Addressing developer feedback
21:54 – Live API roadmap
23:49 – The role of long context
24:57 – What’s next for the Live API
26:41 – State of the AI audio market
30:10 – Advice for developers getting started with the Live API
31:16 – Live API demo
38:10 – Demo wrap up and closing
Listen to this podcast:
Apple Podcasts → https://goo.gle/3Bm7QzQ
Spotify → https://goo.gle/3ZL3ADl
Watch more Release Notes → https://goo.gle/4njokfg
Subscribe to Google for Developers → https://goo.gle/developers
Speaker: Logan Kilpatrick, Shrestha Basu Mallick
Products Mentioned: Google AI Read More Google for Developers