AI Startup Spotlight: Evaluating Multi-Turn AI Agents with Azure AI Foundry and

Post Content

As AI evolves from RAG to complex agents, effective evaluation becomes increasingly critical – and traditional evaluation approaches fall short. As you build advanced AI applications with Azure AI Foundry, the GenAI measurement problem becomes even more acute. In this session, we’ll take you through key considerations and best practices when evaluating AI agents, including evaluating the LLM Planner, the final response, and ensuring efficiency as well as accuracy in tool selection across the chain. Leave with practical strategies to implement evaluation pipelines that grow smarter through human feedback and autonomous learning.

To learn more, please check out these resources:
* https://aka.ms/build25/plan/CreateAgenticAISolutions

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀:
* Yash Sheth

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻:
This is one of many sessions from the Microsoft Build 2025 event. View even more sessions on-demand and learn about Microsoft Build at https://build.microsoft.com

DEM593 | English (US) | AI, Copilot & Agents

#MSBuild

Chapters:
0:00 – Company Size and Growth
00:00:21 – Collaborations with Enterprise Companies
00:00:35 – Focus on Multi-Turn Agents and Evaluations
00:04:54 – Example Introduction: World’s Best Travel Agent Built on Azure AI Foundry
00:05:15 – Functional Details of the Planner Agent
00:07:32 – Introduction to Agent Metrics
00:09:29 – Action Advancement and Turn Frequency
00:10:40 – Introduction to Outcome-Based Metrics and Workflow Routing
00:12:18 – Galileo’s Implementation of Agent Reliability and Distributed Tracing Capabilities Read More Microsoft Developer