2023年5月11日 16:09:31go评论95阅读模式

英文:

Concentus: Distorted audio after decoding and mixed

问题

I have made a simple VOIP group chat that works perfectly when there is no encoder/decoder however as soon as I implement a codec and have it mixing multiple decoded audio streams then there is a ton of noise. If it's just 1 on 1 talking then there is no noise.

This is what I tried.
Currently I am recording at 48khz 40ms 16bit Mono audio and encoding with the concentus lib. Code Below... (BytesToShorts and ShortsToBytes methods are direct copies from the demo of the concentus lib)

private void AudioDataAvailable(object? sender, WaveInEventArgs e)
{
    if (IsDeafened || IsMuted)
        return;
    float max = 0;
    // interpret as 16 bit audio
    for (int index = 0; index &lt; e.BytesRecorded; index += 2)
    {
        short sample = (short)((e.Buffer[index + 1] &lt;&lt; 8) |
                                e.Buffer[index + 0]);
        // to floating point
        var sample32 = sample / 32768f;
        // absolute value 
        if (sample32 &lt; 0) sample32 = -sample32;
        if (sample32 &gt; max) max = sample32;
    }
    if (max &gt; 0.08)
    {
        RecordDetection = DateTime.UtcNow;
    }
    if (DateTime.UtcNow.Subtract(RecordDetection).Seconds &lt; 1)
    {
        short[] pcm = BytesToShorts(e.Buffer, 0, e.BytesRecorded);
        byte[] encoded = new byte[1000];
        Encoder.Encode(pcm, 0, 960, encoded, 0, encoded.Length);
        var voicePacket = new VoicePacket()
        {
            PacketAudio = encoded,
            PacketDataIdentifier = PacketIdentifier.Audio,
            PacketVersion = Network.Network.Version,
            PacketBytesRecorded = e.BytesRecorded
        };
        VCClient.Send(voicePacket);
    }
}

Then on the decode side I do this...

private Task VC_OnAudioReceived(byte[] Audio, string Key, float Volume, int BytesRecorded, float RotationSource)
{
    _ = Task.Factory.StartNew(() =&gt;
    {
        var participant = Participants.FirstOrDefault(x =&gt; x.LoginKey == Key);
        if (participant != null)
        {
            try
            {
                short[] decoded = new short[960];
                Decoder.Decode(Audio, 0, Audio.Length, decoded, 0, 960, false);
                byte[] decodedBytes = ShortsToBytes(decoded, 0, decoded.Length);
                participant.FloatProvider.Volume = Volume;
                if (DirectionalAudio)
                {
                    participant.MonoToStereo.LeftVolume = (float)(0.5 + Math.Sin(RotationSource) * 0.5);
                    participant.MonoToStereo.RightVolume = (float)(0.5 - Math.Sin(RotationSource) * 0.5);
                }
                participant.WaveProvider.AddSamples(decodedBytes, 0, BytesRecorded);
            }
            catch { }
        }
    });
    if(!Stopping &amp;&amp; AudioPlayer.PlaybackState == PlaybackState.Stopped)
        AudioPlayer.Play();
    return Task.CompletedTask;
}

participant variable consists of

public class ParticipantModel
{
   public string Name { get; set; } = &quot;&quot;;
   public string LoginKey { get; set; } = &quot;&quot;;
   public BufferedWaveProvider WaveProvider { get; set; }
   public Wave16ToFloatProvider FloatProvider { get; set; }
   public MonoToStereoSampleProvider MonoToStereo { get; set; }
}

and is created like so...

case VCSignalling_Packet.PacketIdentifier.Login:
   var participant = new ParticipantModel()
   {
      LoginKey = packet.PacketLoginKey,
      Name = packet.PacketName,
      WaveProvider = new BufferedWaveProvider(VoipService.GetRecordFormat) { DiscardOnBufferOverflow = true }
   };
   participant.FloatProvider = new Wave16ToFloatProvider(participant.WaveProvider);
   participant.MonoToStereo = new NAudio.Wave.SampleProviders.MonoToStereoSampleProvider(participant.FloatProvider.ToSampleProvider());
   OnParticipantLogin?.Invoke(participant);
   break;

Record and Playback formats

public static WaveFormat GetRecordFormat { get =&gt; new WaveFormat(SampleRate, 16, Channels); }
public static WaveFormat GetAudioFormat { get =&gt; WaveFormat.CreateIeeeFloatWaveFormat(SampleRate, Channels * 2); }

Opus encoder and decoder initializations.

Encoder = new OpusEncoder(SampleRate, Channels, Concentus.Enums.OpusApplication.OPUS_APPLICATION_VOIP);
Encoder.Complexity = 0;
Encoder.UseVBR = true;
Decoder = new OpusDecoder(SampleRate, Channels);

Mixer, Recorder and AudioPlayer

Mixer = new MixingSampleProvider(GetAudioFormat);
Normalizer = new SoftLimiter(Mixer);
Normalizer.Boost.CurrentValue = 5;
AudioRecorder = audioManager.CreateRecorder(GetRecordFormat);
AudioPlayer = audioManager.CreatePlayer(Normalizer);

Softlimiter class taken and implemented from https://www.markheath.net/post/limit-audio-naudio

I have isolated that the problem is the codec. When doing everything without compressing the audio. Everything is clear and working fine with 3-5+ people talking at the same time. As soon as the codec or A codec is implemented then the mixing of audio with 2 people talking at the same time just gets distorted and crackly. (I've tried to use g722 codec before).

I know there is some loss going on with decoder but I did not expect it to fully affect mixing. What is going on and what can I do to fix it? Would I have to do audio cleaning or is there something I am missing?

英文:

private void AudioDataAvailable(object? sender, WaveInEventArgs e)
        {
            if (IsDeafened || IsMuted)
                return;
            float max = 0;
            // interpret as 16 bit audio
            for (int index = 0; index &lt; e.BytesRecorded; index += 2)
            {
                short sample = (short)((e.Buffer[index + 1] &lt;&lt; 8) |
                                        e.Buffer[index + 0]);
                // to floating point
                var sample32 = sample / 32768f;
                // absolute value 
                if (sample32 &lt; 0) sample32 = -sample32;
                if (sample32 &gt; max) max = sample32;
            }
            if (max &gt; 0.08)
            {
                RecordDetection = DateTime.UtcNow;
            }
            if (DateTime.UtcNow.Subtract(RecordDetection).Seconds &lt; 1)
            {
                short[] pcm = BytesToShorts(e.Buffer, 0, e.BytesRecorded);
                byte[] encoded = new byte[1000];
                Encoder.Encode(pcm, 0, 960, encoded, 0, encoded.Length);
                var voicePacket = new VoicePacket()
                {
                    PacketAudio = encoded,
                    PacketDataIdentifier = PacketIdentifier.Audio,
                    PacketVersion = Network.Network.Version,
                    PacketBytesRecorded = e.BytesRecorded
                };
                VCClient.Send(voicePacket);
            }
        }

Then on the decode side I do this...

private Task VC_OnAudioReceived(byte[] Audio, string Key, float Volume, int BytesRecorded, float RotationSource)
        {
            _ = Task.Factory.StartNew(() =&gt;
            {
                var participant = Participants.FirstOrDefault(x =&gt; x.LoginKey == Key);
                if (participant != null)
                {
                    try
                    {
                        short[] decoded = new short[960];
                        Decoder.Decode(Audio, 0, Audio.Length, decoded, 0, 960, false);
                        byte[] decodedBytes = ShortsToBytes(decoded, 0, decoded.Length);
                        participant.FloatProvider.Volume = Volume;
                        if (DirectionalAudio)
                        {
                            participant.MonoToStereo.LeftVolume = (float)(0.5 + Math.Sin(RotationSource) * 0.5);
                            participant.MonoToStereo.RightVolume = (float)(0.5 - Math.Sin(RotationSource) * 0.5);
                        }
                        participant.WaveProvider.AddSamples(decodedBytes, 0, BytesRecorded);
                    }
                    catch { }
                }
            });
            if(!Stopping &amp;&amp; AudioPlayer.PlaybackState == PlaybackState.Stopped)
                AudioPlayer.Play();
            return Task.CompletedTask;
        }

participant variable consists of

public class ParticipantModel
{
   public string Name { get; set; } = &quot;&quot;;
   public string LoginKey { get; set; } = &quot;&quot;;
   public BufferedWaveProvider WaveProvider { get; set; }
   public Wave16ToFloatProvider FloatProvider { get; set; }
   public MonoToStereoSampleProvider MonoToStereo { get; set; }
}

and is created like so...

case VCSignalling_Packet.PacketIdentifier.Login:
   var participant = new ParticipantModel()
   {
      LoginKey = packet.PacketLoginKey,
      Name = packet.PacketName,
      WaveProvider = new BufferedWaveProvider(VoipService.GetRecordFormat) { DiscardOnBufferOverflow = true }
   };
   participant.FloatProvider = new Wave16ToFloatProvider(participant.WaveProvider);
   participant.MonoToStereo = new NAudio.Wave.SampleProviders.MonoToStereoSampleProvider(participant.FloatProvider.ToSampleProvider());
   OnParticipantLogin?.Invoke(participant);
   break;

Record and Playback formats

public static WaveFormat GetRecordFormat { get =&gt; new WaveFormat(SampleRate, 16, Channels); }
public static WaveFormat GetAudioFormat { get =&gt; WaveFormat.CreateIeeeFloatWaveFormat(SampleRate, Channels * 2); }

Opus encoder and decoder initializations.

Encoder = new OpusEncoder(SampleRate, Channels, Concentus.Enums.OpusApplication.OPUS_APPLICATION_VOIP);
Encoder.Complexity = 0;
Encoder.UseVBR = true;
Decoder = new OpusDecoder(SampleRate, Channels);

Mixer, Recorder and AudioPlayer

Mixer = new MixingSampleProvider(GetAudioFormat);
Normalizer = new SoftLimiter(Mixer);
Normalizer.Boost.CurrentValue = 5;
AudioRecorder = audioManager.CreateRecorder(GetRecordFormat);
AudioPlayer = audioManager.CreatePlayer(Normalizer);

Softlimiter class taken and implemented from https://www.markheath.net/post/limit-audio-naudio

答案1

得分: 0

I guess it's one of those moments again where you ask a question then a day later you try something and it works...

Alright, so I figured out why there are tons of distortions when there are 2 people talking at the same time with an audio codec in the middle. What I had to do was change it so there is a decoder per connected client on the client side. So what I did was modify both snippets below...

public class ParticipantModel
{
    public string Name { get; set; } = "";
    public string LoginKey { get; set; } = "";
    public BufferedWaveProvider WaveProvider { get; set; }
    public Wave16ToFloatProvider FloatProvider { get; set; }
    public MonoToStereoSampleProvider MonoToStereo { get; set; }
    public OpusDecoder Decoder { get; set; }
}

private Task VC_OnAudioReceived(byte[] Audio, string Key, float Volume, int BytesRecorded, float RotationSource)
{
    _ = Task.Factory.StartNew(() =>
    {
        var participant = Participants.FirstOrDefault(x => x.LoginKey == Key);
        if (participant != null)
        {
            try
            {
                short[] decoded = new short[960];
                participant.Decoder.Decode(Audio, 0, Audio.Length, decoded, 0, decoded.Length, false);
                byte[] decodedBytes = ShortsToBytes(decoded, 0, decoded.Length);
                participant.FloatProvider.Volume = Volume;
                if (DirectionalAudio)
                {
                    participant.MonoToStereo.LeftVolume = (float)(0.5 + Math.Sin(RotationSource) * 0.5);
                    participant.MonoToStereo.RightVolume = (float)(0.5 - Math.Sin(RotationSource) * 0.5);
                }
                participant.WaveProvider.AddSamples(decodedBytes, 0, BytesRecorded);
            }
            catch { }
        }
    });
}

After this, I came to the conclusion that audio codecs work on a per stream basis and do not independently decode each individual sample.

英文:

I guess it's one of those moments again where you ask a question then a day later you try something and it works...

Alright so I figured out why there are tons of distortions when there are 2 people talking at the same time with an audio codec in the middle. What I had to do was change it so there is a decoder per connected client on the client side. So what I did was modify both snippets below...

public class ParticipantModel
{
    public string Name { get; set; } = &quot;&quot;;
    public string LoginKey { get; set; } = &quot;&quot;;
    public BufferedWaveProvider WaveProvider { get; set; }
    public Wave16ToFloatProvider FloatProvider { get; set; }
    public MonoToStereoSampleProvider MonoToStereo { get; set; }
    public OpusDecoder Decoder { get; set; }
}

private Task VC_OnAudioReceived(byte[] Audio, string Key, float Volume, int BytesRecorded, float RotationSource)
        {
            _ = Task.Factory.StartNew(() =&gt;
            {
                var participant = Participants.FirstOrDefault(x =&gt; x.LoginKey == Key);
                if (participant != null)
                {
                    try
                    {
                        short[] decoded = new short[960];
                        participant.Decoder.Decode(Audio, 0, Audio.Length, decoded, 0, decoded.Length, false);
                        byte[] decodedBytes = ShortsToBytes(decoded, 0, decoded.Length);
                        participant.FloatProvider.Volume = Volume;
                        if (DirectionalAudio)
                        {
                            participant.MonoToStereo.LeftVolume = (float)(0.5 + Math.Sin(RotationSource) * 0.5);
                            participant.MonoToStereo.RightVolume = (float)(0.5 - Math.Sin(RotationSource) * 0.5);
                        }
                        participant.WaveProvider.AddSamples(decodedBytes, 0, BytesRecorded);
                    }
                    catch { }
                }
            });

After this I came to the conclusion that audio codecs work on a per stream basis and do not independently decode each individual sample.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Concentus: 解码和混音后音频失真

问题

答案1

如何使这个提取代码正常工作？

生成 JAVA MD5 哈希以匹配 C# MD5 哈希。

Lambda表达式中查找日期在另一个日期列表中的列表项的最佳方法

使用Azure Blob存储加密，无需密钥保管库。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。