Faster AI Responses with Semantic Caching in Azure Managed Redis

Post Content

AI apps are only as efficient as how they handle repeated intent. In this demo, see how semantic caching with Azure Managed Redis identifies similar requests and serves responses instantly—without reprocessing tokens. Compare a standard LLM workflow to a cached experience in real time, and learn how vector similarity search reduces latency, cuts token usage, and lowers cost for scalable AI apps.

To learn more, please check out these resources:
* https://aka.ms/build26-next-steps

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀:
* Philip Laussermair

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻:
This is one of many sessions from the Microsoft Build 2026 event. View even more sessions on-demand and learn about Microsoft Build at https://build.microsoft.com

OD823 | English (US) | Cloud platform & data

Pre-recorded | (200) Intermediate

#MSBuild

Chapters:
0:00 – Introduction to Azure Managed Redis and its role in AI app development
00:01:10 – Overview of Azure Managed Redis features and advantages
00:02:37 – Understanding AI agents and production challenges
00:05:18 – Key challenges: Cost management and memory/context in AI agents
00:06:27 – Transition to Philip illustrating semantic caching use case
00:06:37 – Redis Search module as the secret sauce for vector similarity search
00:10:58 – Live demo: Semantic caching reducing token usage and improving performance
00:15:18 – Exploring vector similarity thresholds and cache configurations
00:19:00 – Semantic caching ROI and cost savings analysis
00:19:57 – Roy explains agent memory using Azure Managed Redis with demo and wrap-up Read More Microsoft Developer