The Smokejumpers: Scaling Gemini’s serving infrastructure

Estimated read time 2 min read

Post Content

​ Host Logan Kilpatrick sits down with Ema Taropa, lead of the “Smokejumpers” team, to discuss the infrastructure and serving work behind Gemini. They dive into what it actually takes to serve models to billions of users, why there is no “easy button” for global scaling, and how the team manages the trade-offs between latency, capacity, and cost.

Their conversation covers the origins of the Smokejumpers team, Google’s vertically integrated TPU strategy, and the high-intensity culture of shipping Gemini 3.0. Learn more about the technical challenges of LLM caching and the human stories behind the engineers keeping Google’s AI systems running 24/7.

Listen to this podcast:
Apple Podcasts → https://goo.gle/3Bm7QzQ
Spotify → https://goo.gle/3ZL3ADl

Chapters:
0:00 – Intro
1:34 – Scaling distributed systems for Gemini
3:43 – The Smokejumpers team infrastructure
9:33 – Refining model launch strategies
10:30 – Infrastructure trade-offs and global capacity
13:01 – The difficulty of LLM caching
15:09 – Google’s TPU strategy
16:04 – Esprit de corps and collaborative culture
18:01 – Context windows and workspace embeddings
18:14 – The human element of engineering teams
23:16 – Performance and efficiency of Gemini Flash
24:59 – Conclusion

Watch more Release Notes → https://goo.gle/4njokfg
Subscribe to Google for Developers → https://goo.gle/developers

Speaker: Logan Kilpatrick, Ema Taropa
Products Mentioned: Google AI, Gemini   Read More Google for Developers 

You May Also Like

More From Author