Evaluating AI Models with Microsoft Foundry | MVP Unplugged

Estimated read time 4 min read

Post Content

​ Welcome to the next MVP Unplugged, where Microsoft MVPs share real-world projects and insights from the field! In this episode, host Justin Garrett sits down with Microsoft MVP Veronika Kolesnikova to explore how she picks the right AI model for common developer tasks using Evaluations in Microsoft Foundry, complete with query-context-results, Python SDK workflows, and measures for evalution. We’ll explore some fun datasets around hiking, acrobatics, and common developer terminology too!

Veronika walks through her full process—from generating datasets with GitHub Copilot, to running multi-model evaluations, to analyzing outputs in the Microsoft Foundry portal. Whether you’re building agents, experimenting with models, or ensuring AI reliability at scale, this episode breaks down a repeatable and practical approach you can use today.

⭐ What You’ll Learn
How to build custom evaluation datasets using AI
How to compare outputs across models like GPT‑5, Grok-4, and Claude Sonnet 4.5
How to run Evaluations programmatically using the new Microsoft Foundry SDK
How to measure AI performance using F1, METEOR, similarity scores, and thresholds
Tips for choosing the right model for your AI agent
Practical debugging and iteration strategies for model quality
How to store and version evaluation datasets in Microsoft Foundry

👥 Speakers
Veronika Kolesnikova is a Microsoft MVP in AI and a Principal AI Engineer at Liberty Mutual in Boston MA. Veronika started her career as a QA engineer and then moved to Software engineering and recently to AI engineering. She’s an international public speaker, Boston Azure AI user group co-organizer and a tech mentor. Follow Veronika on LinkedIn

Justin Garrett is host of MVP Unplugged, Principal PM in Developer Relations which is part of Microsoft Cloud + AI. Justin’s career at Microsoft also spans 20 years across Windows, Bing, Edge, Web Platform, Students/ University Relations, Cloud Advocacy, and most recently a leader of the MVP Program at Microsoft. Follow Justin on LinkedIn.
About MVP Unplugged

About MVP Unplugged
AI is reshaping how we work and live. And for developers and technologists alike, the pace of innovation–new tools, new models, patterns & practices, and even culture itself–is changing even faster. It can be difficult to know what to learn, what to prioritize, what truly lives up to the promise of unlocking creativity and boosting productivity. Join Justin Garrett, Principal PM in DevRel and leader in the Microsoft MVP Program as he speaks with MVPs to share what they’re learning using a real-world project in this conversational series. In each episode, they’ll experiment, code, and share honest insights that can make a real difference for the audience. Justin and his guests share stories of navigating technological change and look ahead for what’s next in tech. Come discover with us how to thrive in this era of AI!

✅ Chapter Markers
00:00 – Intro to MVP Unplugged
00:20 – Meet Microsoft MVP Veronika Kolesnikova
01:30 – Becoming an MVP: Veronika’s Journey
02:20 – Introducing Microsoft Foundry Evaluations
03:03 – Circus, Hiking & Engineering: Creating AI Data Sets
04:55 – Inside the Jupyter Notebooks & Evaluation Setup
06:11 – Connecting to Azure AI Projects
07:55 – Dataset Structure: Query, Context, Ground Truth
09:14 – Why Evaluations Matter for Real AI Projects
10:42 – Exploring the Foundry UI (Classic + New Portal)
12:04 – Uploading and Versioning Data Sets in Foundry
13:37 – Evaluation Results: GPT‑5, Claude, Grok
18:29 – Thresholds, System Prompts & Model Behavior
21:07 – Deep Dive: Quad vs GPT‑5 Performance
23:17 – Short Answers vs Long Answers & Scoring
24:20 – Circus Data Set Analysis
25:25 – Software Engineering Data Set Results
27:51 – Documentation & Learning Resources
29:04 – Running Evaluations with the New Foundry SDK
31:12 – Differences Between Old & New SDK
32:17 – How Veronika Chooses the Best Model
35:22 – GitHub Copilot for Model Testing
36:28 – Microsoft Learn Resources
37:26 – What Veronika Wants AI To Do Next
38:32 – Final Advice for Developers
39:03 – Closing

🔗 Resources & Links
🎁Free Microsoft Foundry Trial
https://aka.ms/devrelft
📚 Microsoft Foundry Observability
https://learn.microsoft.com/azure/ai-foundry/concepts/observability
🧪 Foundry Model Leaderboard
https://ai.azure.com/explore/models/leaderboard
📘 Evaluating AI Models
https://learn.microsoft.com/azure/ai-services/foundry/evaluations
💻 Veronika’s GitHub Repo (Evaluation Project)
https://github.com/Veroni4ka/RAI_notebooks/
🚀 Try GitHub Copilot
https://github.com/features/copilot

🔔 Subscribe for more MVP stories, AI engineering walkthroughs, and hands‑on Microsoft developer content!

#microsoftdeveloper #MVPUnplugged #MicrosoftFoundry #AIEvaluations #AzureAI #GitHubCopilot #AIModels #MachineLearning #Claude #Grok #GPT5 #DeveloperCommunity #AIEngineering #JupyterNotebooks #PythonDevelopers   Read More Microsoft Developer 

You May Also Like

More From Author