Transform your agentic apps with voice in Foundry

Post Content

In this session, we will be announcing a new API, enabling developers to create voice-enabled agents or transform text-based interfaces with voice end-to-end. Now in public preview, it combines speech-to-text, text-to-speech, and foundation models like GPT-4o through a unified API. Features include 600+ voices, 150+ locales, noise suppression, echo cancellation, interruption detection, and avatar customization for real-time applications.

To learn more, please check out these resources:
* https://aka.ms/build25/plan/BestModelGenAISolution

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀:
* Prem Parameswaran
* Dong Li
* Hasya Shah
* Gerald Ertl

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻:
This is one of many sessions from the Microsoft Build 2025 event. View even more sessions on-demand and learn about Microsoft Build at https://build.microsoft.com

BRK144 | English (US) | AI, Copilot & Agents

#MSBuild

Chapters:
0:00 – Introduction and Welcome
00:02:21 – Announcement of New Video Translation API
00:20:09 – Interactive Software Renewal Process Begins
00:21:01 – Customer Request for License Downgrade
00:28:30 – Overview of ACMS with a Final Demo
00:32:58 – Multilingual Speech Capabilities
00:33:17 – Roleplay Introduction as Agent ‘Andrew’
00:43:23 – Server-side audio enhancements like noise suppression and echo cancellation
00:44:28 – Integration of avatars in user interaction Read More Microsoft Developer