Post Content
โย In this session, we will be announcing a new API, enabling developers to create voice-enabled agents or transform text-based interfaces with voice end-to-end. Now in public preview, it combines speech-to-text, text-to-speech, and foundation models like GPT-4o through a unified API. Features include 600+ voices, 150+ locales, noise suppression, echo cancellation, interruption detection, and avatar customization for real-time applications.
To learn more, please check out these resources:
* https://aka.ms/build25/plan/BestModelGenAISolution
๐ฆ๐ฝ๐ฒ๐ฎ๐ธ๐ฒ๐ฟ๐:
* Prem Parameswaran
* Dong Li
* Hasya Shah
* Gerald Ertl
๐ฆ๐ฒ๐๐๐ถ๐ผ๐ป ๐๐ป๐ณ๐ผ๐ฟ๐บ๐ฎ๐๐ถ๐ผ๐ป:
This is one of many sessions from the Microsoft Build 2025 event. View even more sessions on-demand and learn about Microsoft Build at https://build.microsoft.com
BRK144 | English (US) | AI, Copilot & Agents
#MSBuild
Chapters:
0:00 – Introduction and Welcome
00:02:21 – Announcement of New Video Translation API
00:20:09 – Interactive Software Renewal Process Begins
00:21:01 – Customer Request for License Downgrade
00:28:30 – Overview of ACMS with a Final Demo
00:32:58 – Multilingual Speech Capabilities
00:33:17 – Roleplay Introduction as Agent ‘Andrew’
00:43:23 – Server-side audio enhancements like noise suppression and echo cancellation
00:44:28 – Integration of avatars in user interactionย ย ย Read Moreย Microsoft Developerย