Episode 4: Indirect Prompt Injection Explained | AI Red Teaming 101

Post Content

Welcome back to AI Red Teaming 101!

In this episode, Gary Lopez, Principal Offensive AI Scientist at Microsoft’s ADAPT team, walks us through indirect prompt injection—a stealthy and powerful attack vector where malicious instructions are hidden in external data sources like emails, websites, or databases.

Gary explains how these attacks work, why they’re so dangerous, and demonstrates a real-world example using a summarization bot that gets hijacked into writing poetry in Spanish. You’ll also see how to simulate this attack using Microsoft’s red teaming labs.

What You’ll Learn:

The difference between direct and indirect prompt injection
How attackers exploit external data to hijack model behavior
How to try out these attacks using Microsoft’s AI red teaming labs

✅ Chapters:
00:00 – Welcome & episode overview
00:20 – Recap: direct vs. indirect prompt injection
01:00 – How indirect prompt injection works
01:40 – Poisoning external data sources
02:20 – Case study: email summarization attack
03:00 – Why LLMs can’t distinguish instruction sources
03:40 – Live demo: HTML summarization jailbreak
05:00 – Modifying headers to hijack model behavior
06:00 – Summary and key takeaways

✅ Links & Resources:
AI Red Teaming 101 Episodes: aka.ms/airt101
AI Red Teaming 101 Labs & Tools: aka.ms/airtlabs
Microsoft AI Red Team Overview: aka.ms/airedteam

✅ Speakers:
Amanda Minnich – Principal Research Manager, Microsoft AI Red Team
LinkedIn: https://www.linkedin.com/in/amandajeanminnich/

Webpage: https://www.amandaminnich.info/

Gary Lopez – Principal Offensive AI Scientist, ADAPT
LinkedIn: https://www.linkedin.com/in/gary-lopez/

#AIRedTeam #AIRT #Microsoft #AI #AISecurity #AIRedTeaming #GenerativeAI #Cybersecurity #InfoSec #cybersecurityawareness #PromptInjection Read More Microsoft Developer