Episode 7: Defending Against Attacks: Mitigations and Guardrails

Post Content

Welcome back to AI Red Teaming 101!
In this episode, Gary Lopez, Principal Offensive AI Scientist at Microsoft’s ADAPT team, shifts the focus from offense to defense. Now that we’ve covered how generative AI systems can be attacked, this episode dives into mitigation strategies and guardrail techniques that help secure AI applications against prompt injection and jailbreaks.

Gary introduces Spotlighting, a family of prompt engineering defenses developed by Microsoft in 2024, and walks through three powerful techniques—Delimiting, Data Marking, and Encoding—that help models distinguish between trusted and untrusted inputs.

What You’ll Learn:

How to defend against prompt injection and jailbreak attacks
How Spotlighting helps models separate safe vs. unsafe input
How to implement delimiting, data marking, and encoding in your apps

✅ Chapters:
00:00 – Welcome & episode overview
00:20 – Why mitigation matters
01:00 – Real-world attack scenario: email summarization
01:55 – Spotlighting technique 1: Delimiting
03:11 – Spotlighting technique 2: Data Marking
04:33 – Spotlighting technique 3: Encoding (Base64, ROT13, etc.)
05:56 – Why these techniques work best on capable models
06:40 – Final thoughts & where to learn more

✅ Links & Resources:
AI Red Teaming 101 Episodes: aka.ms/airt101
AI Red Teaming 101 Labs & Tools: aka.ms/airtlabs
Microsoft AI Red Team Overview: aka.ms/airedteam

✅ Speakers:
Amanda Minnich – Principal Research Manager, Microsoft AI Red Team
LinkedIn: https://www.linkedin.com/in/amandajeanminnich/

Webpage: https://www.amandaminnich.info/

Gary Lopez – Principal Offensive AI Scientist, ADAPT
LinkedIn: https://www.linkedin.com/in/gary-lopez/

#AIRedTeam #AIRT #Microsoft #AI #AISecurity #AIRedTeaming #GenerativeAI #Cybersecurity #InfoSec #cybersecurityawareness #ResponsibleAI #AIModelDefense Read More Microsoft Developer