Post Content
โย As AI evolves from RAG to complex agents, effective evaluation becomes increasingly critical – and traditional evaluation approaches fall short. As you build advanced AI applications with Azure AI Foundry, the GenAI measurement problem becomes even more acute. In this session, weโll take you through key considerations and best practices when evaluating AI agents, including evaluating the LLM Planner, the final response, and ensuring efficiency as well as accuracy in tool selection across the chain. Leave with practical strategies to implement evaluation pipelines that grow smarter through human feedback and autonomous learning.
To learn more, please check out these resources:
* https://aka.ms/build25/plan/CreateAgenticAISolutions
๐ฆ๐ฝ๐ฒ๐ฎ๐ธ๐ฒ๐ฟ๐:
* Yash Sheth
๐ฆ๐ฒ๐๐๐ถ๐ผ๐ป ๐๐ป๐ณ๐ผ๐ฟ๐บ๐ฎ๐๐ถ๐ผ๐ป:
This is one of many sessions from the Microsoft Build 2025 event. View even more sessions on-demand and learn about Microsoft Build at https://build.microsoft.com
DEM593 | English (US) | AI, Copilot & Agents
#MSBuild
Chapters:
0:00 – Company Size and Growth
00:00:21 – Collaborations with Enterprise Companies
00:00:35 – Focus on Multi-Turn Agents and Evaluations
00:04:54 – Example Introduction: World’s Best Travel Agent Built on Azure AI Foundry
00:05:15 – Functional Details of the Planner Agent
00:07:32 – Introduction to Agent Metrics
00:09:29 – Action Advancement and Turn Frequency
00:10:40 – Introduction to Outcome-Based Metrics and Workflow Routing
00:12:18 – Galileo’s Implementation of Agent Reliability and Distributed Tracing Capabilitiesย ย ย Read Moreย Microsoft Developerย