Concentus: 解码和混音后音频失真

huangapple go评论54阅读模式
英文:

Concentus: Distorted audio after decoding and mixed

问题

I have made a simple VOIP group chat that works perfectly when there is no encoder/decoder however as soon as I implement a codec and have it mixing multiple decoded audio streams then there is a ton of noise. If it's just 1 on 1 talking then there is no noise.

This is what I tried.
Currently I am recording at 48khz 40ms 16bit Mono audio and encoding with the concentus lib. Code Below... (BytesToShorts and ShortsToBytes methods are direct copies from the demo of the concentus lib)

private void AudioDataAvailable(object? sender, WaveInEventArgs e)
{
    if (IsDeafened || IsMuted)
        return;

    float max = 0;
    // interpret as 16 bit audio
    for (int index = 0; index < e.BytesRecorded; index += 2)
    {
        short sample = (short)((e.Buffer[index + 1] << 8) |
                                e.Buffer[index + 0]);
        // to floating point
        var sample32 = sample / 32768f;
        // absolute value 
        if (sample32 < 0) sample32 = -sample32;
        if (sample32 > max) max = sample32;
    }

    if (max > 0.08)
    {
        RecordDetection = DateTime.UtcNow;
    }

    if (DateTime.UtcNow.Subtract(RecordDetection).Seconds < 1)
    {
        short[] pcm = BytesToShorts(e.Buffer, 0, e.BytesRecorded);
        byte[] encoded = new byte[1000];
        Encoder.Encode(pcm, 0, 960, encoded, 0, encoded.Length);
        var voicePacket = new VoicePacket()
        {
            PacketAudio = encoded,
            PacketDataIdentifier = PacketIdentifier.Audio,
            PacketVersion = Network.Network.Version,
            PacketBytesRecorded = e.BytesRecorded
        };
        VCClient.Send(voicePacket);
    }
}

Then on the decode side I do this...

private Task VC_OnAudioReceived(byte[] Audio, string Key, float Volume, int BytesRecorded, float RotationSource)
{
    _ = Task.Factory.StartNew(() =>
    {
        var participant = Participants.FirstOrDefault(x => x.LoginKey == Key);
        if (participant != null)
        {
            try
            {
                short[] decoded = new short[960];
                Decoder.Decode(Audio, 0, Audio.Length, decoded, 0, 960, false);
                byte[] decodedBytes = ShortsToBytes(decoded, 0, decoded.Length);
                participant.FloatProvider.Volume = Volume;
                if (DirectionalAudio)
                {
                    participant.MonoToStereo.LeftVolume = (float)(0.5 + Math.Sin(RotationSource) * 0.5);
                    participant.MonoToStereo.RightVolume = (float)(0.5 - Math.Sin(RotationSource) * 0.5);
                }
                participant.WaveProvider.AddSamples(decodedBytes, 0, BytesRecorded);
            }
            catch { }
        }
    });

    if(!Stopping && AudioPlayer.PlaybackState == PlaybackState.Stopped)
        AudioPlayer.Play();

    return Task.CompletedTask;
}

participant variable consists of

public class ParticipantModel
{
   public string Name { get; set; } = "";
   public string LoginKey { get; set; } = "";
   public BufferedWaveProvider WaveProvider { get; set; }
   public Wave16ToFloatProvider FloatProvider { get; set; }
   public MonoToStereoSampleProvider MonoToStereo { get; set; }
}

and is created like so...

case VCSignalling_Packet.PacketIdentifier.Login:
   var participant = new ParticipantModel()
   {
      LoginKey = packet.PacketLoginKey,
      Name = packet.PacketName,
      WaveProvider = new BufferedWaveProvider(VoipService.GetRecordFormat) { DiscardOnBufferOverflow = true }
   };
   participant.FloatProvider = new Wave16ToFloatProvider(participant.WaveProvider);
   participant.MonoToStereo = new NAudio.Wave.SampleProviders.MonoToStereoSampleProvider(participant.FloatProvider.ToSampleProvider());

   OnParticipantLogin?.Invoke(participant);
   break;

Record and Playback formats

public static WaveFormat GetRecordFormat { get => new WaveFormat(SampleRate, 16, Channels); }
public static WaveFormat GetAudioFormat { get => WaveFormat.CreateIeeeFloatWaveFormat(SampleRate, Channels * 2); }

Opus encoder and decoder initializations.

Encoder = new OpusEncoder(SampleRate, Channels, Concentus.Enums.OpusApplication.OPUS_APPLICATION_VOIP);
Encoder.Complexity = 0;
Encoder.UseVBR = true;

Decoder = new OpusDecoder(SampleRate, Channels);

Mixer, Recorder and AudioPlayer

Mixer = new MixingSampleProvider(GetAudioFormat);
Normalizer = new SoftLimiter(Mixer);
Normalizer.Boost.CurrentValue = 5;

AudioRecorder = audioManager.CreateRecorder(GetRecordFormat);
AudioPlayer = audioManager.CreatePlayer(Normalizer);

Softlimiter class taken and implemented from https://www.markheath.net/post/limit-audio-naudio

I have isolated that the problem is the codec. When doing everything without compressing the audio. Everything is clear and working fine with 3-5+ people talking at the same time. As soon as the codec or A codec is implemented then the mixing of audio with 2 people talking at the same time just gets distorted and crackly. (I've tried to use g722 codec before).

I know there is some loss going on with decoder but I did not expect it to fully affect mixing. What is going on and what can I do to fix it? Would I have to do audio cleaning or is there something I am missing?

英文:

I have made a simple VOIP group chat that works perfectly when there is no encoder/decoder however as soon as I implement a codec and have it mixing multiple decoded audio streams then there is a ton of noise. If it's just 1 on 1 talking then there is no noise.

This is what I tried.
Currently I am recording at 48khz 40ms 16bit Mono audio and encoding with the concentus lib. Code Below... (BytesToShorts and ShortsToBytes methods are direct copies from the demo of the concentus lib)

private void AudioDataAvailable(object? sender, WaveInEventArgs e)
        {
            if (IsDeafened || IsMuted)
                return;

            float max = 0;
            // interpret as 16 bit audio
            for (int index = 0; index < e.BytesRecorded; index += 2)
            {
                short sample = (short)((e.Buffer[index + 1] << 8) |
                                        e.Buffer[index + 0]);
                // to floating point
                var sample32 = sample / 32768f;
                // absolute value 
                if (sample32 < 0) sample32 = -sample32;
                if (sample32 > max) max = sample32;
            }

            if (max > 0.08)
            {
                RecordDetection = DateTime.UtcNow;
            }

            if (DateTime.UtcNow.Subtract(RecordDetection).Seconds < 1)
            {
                short[] pcm = BytesToShorts(e.Buffer, 0, e.BytesRecorded);
                byte[] encoded = new byte[1000];
                Encoder.Encode(pcm, 0, 960, encoded, 0, encoded.Length);
                var voicePacket = new VoicePacket()
                {
                    PacketAudio = encoded,
                    PacketDataIdentifier = PacketIdentifier.Audio,
                    PacketVersion = Network.Network.Version,
                    PacketBytesRecorded = e.BytesRecorded
                };
                VCClient.Send(voicePacket);
            }
        }

Then on the decode side I do this...

private Task VC_OnAudioReceived(byte[] Audio, string Key, float Volume, int BytesRecorded, float RotationSource)
        {
            _ = Task.Factory.StartNew(() =>
            {
                var participant = Participants.FirstOrDefault(x => x.LoginKey == Key);
                if (participant != null)
                {
                    try
                    {
                        short[] decoded = new short[960];
                        Decoder.Decode(Audio, 0, Audio.Length, decoded, 0, 960, false);
                        byte[] decodedBytes = ShortsToBytes(decoded, 0, decoded.Length);
                        participant.FloatProvider.Volume = Volume;
                        if (DirectionalAudio)
                        {
                            participant.MonoToStereo.LeftVolume = (float)(0.5 + Math.Sin(RotationSource) * 0.5);
                            participant.MonoToStereo.RightVolume = (float)(0.5 - Math.Sin(RotationSource) * 0.5);
                        }
                        participant.WaveProvider.AddSamples(decodedBytes, 0, BytesRecorded);
                    }
                    catch { }
                }
            });

            if(!Stopping && AudioPlayer.PlaybackState == PlaybackState.Stopped)
                AudioPlayer.Play();

            return Task.CompletedTask;
        }

participant variable consists of

public class ParticipantModel
{
   public string Name { get; set; } = "";
   public string LoginKey { get; set; } = "";
   public BufferedWaveProvider WaveProvider { get; set; }
   public Wave16ToFloatProvider FloatProvider { get; set; }
   public MonoToStereoSampleProvider MonoToStereo { get; set; }
}

and is created like so...

case VCSignalling_Packet.PacketIdentifier.Login:
   var participant = new ParticipantModel()
   {
      LoginKey = packet.PacketLoginKey,
      Name = packet.PacketName,
      WaveProvider = new BufferedWaveProvider(VoipService.GetRecordFormat) { DiscardOnBufferOverflow = true }
   };
   participant.FloatProvider = new Wave16ToFloatProvider(participant.WaveProvider);
   participant.MonoToStereo = new NAudio.Wave.SampleProviders.MonoToStereoSampleProvider(participant.FloatProvider.ToSampleProvider());

   OnParticipantLogin?.Invoke(participant);
   break;

Record and Playback formats

public static WaveFormat GetRecordFormat { get => new WaveFormat(SampleRate, 16, Channels); }
public static WaveFormat GetAudioFormat { get => WaveFormat.CreateIeeeFloatWaveFormat(SampleRate, Channels * 2); }

Opus encoder and decoder initializations.

Encoder = new OpusEncoder(SampleRate, Channels, Concentus.Enums.OpusApplication.OPUS_APPLICATION_VOIP);
Encoder.Complexity = 0;
Encoder.UseVBR = true;

Decoder = new OpusDecoder(SampleRate, Channels);

Mixer, Recorder and AudioPlayer

Mixer = new MixingSampleProvider(GetAudioFormat);
Normalizer = new SoftLimiter(Mixer);
Normalizer.Boost.CurrentValue = 5;

AudioRecorder = audioManager.CreateRecorder(GetRecordFormat);
AudioPlayer = audioManager.CreatePlayer(Normalizer);

Softlimiter class taken and implemented from https://www.markheath.net/post/limit-audio-naudio

I have isolated that the problem is the codec. When doing everything without compressing the audio. Everything is clear and working fine with 3-5+ people talking at the same time. As soon as the codec or A codec is implemented then the mixing of audio with 2 people talking at the same time just gets distorted and crackly. (I've tried to use g722 codec before).

I know there is some loss going on with decoder but I did not expect it to fully affect mixing. What is going on and what can I do to fix it? Would I have to do audio cleaning or is there something I am missing?

答案1

得分: 0

I guess it's one of those moments again where you ask a question then a day later you try something and it works...

Alright, so I figured out why there are tons of distortions when there are 2 people talking at the same time with an audio codec in the middle. What I had to do was change it so there is a decoder per connected client on the client side. So what I did was modify both snippets below...

public class ParticipantModel
{
    public string Name { get; set; } = "";
    public string LoginKey { get; set; } = "";
    public BufferedWaveProvider WaveProvider { get; set; }
    public Wave16ToFloatProvider FloatProvider { get; set; }
    public MonoToStereoSampleProvider MonoToStereo { get; set; }
    public OpusDecoder Decoder { get; set; }
}
private Task VC_OnAudioReceived(byte[] Audio, string Key, float Volume, int BytesRecorded, float RotationSource)
{
    _ = Task.Factory.StartNew(() =>
    {
        var participant = Participants.FirstOrDefault(x => x.LoginKey == Key);
        if (participant != null)
        {
            try
            {
                short[] decoded = new short[960];
                participant.Decoder.Decode(Audio, 0, Audio.Length, decoded, 0, decoded.Length, false);
                byte[] decodedBytes = ShortsToBytes(decoded, 0, decoded.Length);
                participant.FloatProvider.Volume = Volume;
                if (DirectionalAudio)
                {
                    participant.MonoToStereo.LeftVolume = (float)(0.5 + Math.Sin(RotationSource) * 0.5);
                    participant.MonoToStereo.RightVolume = (float)(0.5 - Math.Sin(RotationSource) * 0.5);
                }
                participant.WaveProvider.AddSamples(decodedBytes, 0, BytesRecorded);
            }
            catch { }
        }
    });
}

After this, I came to the conclusion that audio codecs work on a per stream basis and do not independently decode each individual sample.

英文:

I guess it's one of those moments again where you ask a question then a day later you try something and it works...

Alright so I figured out why there are tons of distortions when there are 2 people talking at the same time with an audio codec in the middle. What I had to do was change it so there is a decoder per connected client on the client side. So what I did was modify both snippets below...

public class ParticipantModel
{
    public string Name { get; set; } = "";
    public string LoginKey { get; set; } = "";
    public BufferedWaveProvider WaveProvider { get; set; }
    public Wave16ToFloatProvider FloatProvider { get; set; }
    public MonoToStereoSampleProvider MonoToStereo { get; set; }
    public OpusDecoder Decoder { get; set; }
}
private Task VC_OnAudioReceived(byte[] Audio, string Key, float Volume, int BytesRecorded, float RotationSource)
        {
            _ = Task.Factory.StartNew(() =>
            {
                var participant = Participants.FirstOrDefault(x => x.LoginKey == Key);
                if (participant != null)
                {
                    try
                    {
                        short[] decoded = new short[960];
                        participant.Decoder.Decode(Audio, 0, Audio.Length, decoded, 0, decoded.Length, false);
                        byte[] decodedBytes = ShortsToBytes(decoded, 0, decoded.Length);
                        participant.FloatProvider.Volume = Volume;
                        if (DirectionalAudio)
                        {
                            participant.MonoToStereo.LeftVolume = (float)(0.5 + Math.Sin(RotationSource) * 0.5);
                            participant.MonoToStereo.RightVolume = (float)(0.5 - Math.Sin(RotationSource) * 0.5);
                        }
                        participant.WaveProvider.AddSamples(decodedBytes, 0, BytesRecorded);
                    }
                    catch { }
                }
            });

After this I came to the conclusion that audio codecs work on a per stream basis and do not independently decode each individual sample.

huangapple
  • 本文由 发表于 2023年5月11日 16:09:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76225412.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定