Building real-time voice applications with Live API

Estimated read time 2 min read

Post Content

​ Shrestha Basu Mallick, one of the product leads for the Gemini API, joins host Logan Kilpatrick for a deep dive of Gemini Live API, Google’s real-time, multimodal interface for developers. Learn about how native audio alongside new capabilities like proactive audio and async function calling unlocks the unique power of audio as an interface.

0:00 – Intro
1:18 – Live API OVERVIEW
3:36 – Why audio is a special modality
5:07 – Speed vs. precision in audio
6:17 – Controllable and promptable TTS
8:31 – What developers are building with the Live API
11:14 – URL context and async calling features
15:02 – Proactive audio and affective dialog
16:55 – Addressing developer feedback
21:54 – Live API roadmap
23:49 – The role of long context
24:57 – What’s next for the Live API
26:41 – State of the AI audio market
30:10 – Advice for developers getting started with the Live API
31:16 – Live API demo
38:10 – Demo wrap up and closing

Listen to this podcast:
Apple Podcasts → https://goo.gle/3Bm7QzQ
Spotify → https://goo.gle/3ZL3ADl

Watch more Release Notes → https://goo.gle/4njokfg
Subscribe to Google for Developers → https://goo.gle/developers

Speaker: Logan Kilpatrick, Shrestha Basu Mallick
Products Mentioned: Google AI   Read More Google for Developers 

You May Also Like

More From Author