The blog of a cloud agnostic professional and craft beer connoisseur

Microsoft Teams maintains audio quality even in challenging network scenarios

Original Post Read More

Choppy sound. Frozen video. It’s likely everyone has experienced poor meeting and call quality due to a bad internet connection. The growth in hybrid work has made this a more universal challenge, as organizations truly depend on video conferencing tools for continuity, flexibility, and inclusivity. While Microsoft Teams cannot improve the stability of your network, our AI-based innovations help to minimize and even eliminate the negative impact bandwidth constraints can have on your Teams experiences.

 

During online calls and meetings, audio is sent across networks through data “packets”. In instances of poor network quality, these packets can get lost, resulting in distorted speech. Packet Loss Concealment, or PLC, is a technique designed to address the voids in lost data by making assumptions about the missing content. But as you can imagine, this is a simplistic explanation of an incredibly complex solution to address distorted speech. New AI-based enhancements to Teams’ PLC allows concealment of longer durations with greater accuracy.

 

The demo below demonstrates how the new machine learning model for PLC improves the Teams meeting experience. For the demo, we simulated bad network conditions and recorded a Microsoft Teams call.

 

To achieve this, we trained a deep-learning model to predict and “fill” the missing audio bits based on real-time analysis of previous ones. While traditional concealers do a good job on short missing bits, like 20 or 40 milliseconds, the AI model in Teams can predict up to 80 consecutive milliseconds of audio, which makes common packet loss undetectable to Teams users.

 

Teams’ PLC AI model has been trained and tested on 600 hours of open-sourced audio data, such as people reading books and participating in podcasts. For testing purposes, we also collected millions of anonymous network samples, or “traces” from Teams calls to ensure a good representation of all possible network conditions. The best performing model was then selected out of hundreds of candidates.

 

While technical analysis enables us to measure the effectiveness of these improvements, ultimately users validate their impact. Post-release, participants in Teams calls with poor networks reported distorted speech 15% less frequently.

 

To reflect the importance of this work in advancing telephony, Microsoft organized the Audio Deep Packet Loss Concealment Challenge at INTERSPEECH 2022 whose participants included researchers and practitioners from all over the world. As a part of the competition, Microsoft open-sourced network traces we collected as well as a “PLC Mean Opinion Score Model”, so universities and individual researchers could benefit from real-world datasets and human-rated audio files which make model development and evaluation more accessible. 

 

Microsoft Teams users on Windows can now experience the benefits of these PLC improvements, which runs locally only during instances of poor network quality to avoid unnecessary CPU load on users’ machines. We are also testing this system on Mac devices and will soon expand to mobile Teams clients.