英文:
Concentus: Distorted audio after decoding and mixed
问题
I have made a simple VOIP group chat that works perfectly when there is no encoder/decoder however as soon as I implement a codec and have it mixing multiple decoded audio streams then there is a ton of noise. If it's just 1 on 1 talking then there is no noise.
This is what I tried.
Currently I am recording at 48khz 40ms 16bit Mono audio and encoding with the concentus lib. Code Below... (BytesToShorts and ShortsToBytes methods are direct copies from the demo of the concentus lib)
private void AudioDataAvailable(object? sender, WaveInEventArgs e)
{
if (IsDeafened || IsMuted)
return;
float max = 0;
// interpret as 16 bit audio
for (int index = 0; index < e.BytesRecorded; index += 2)
{
short sample = (short)((e.Buffer[index + 1] << 8) |
e.Buffer[index + 0]);
// to floating point
var sample32 = sample / 32768f;
// absolute value
if (sample32 < 0) sample32 = -sample32;
if (sample32 > max) max = sample32;
}
if (max > 0.08)
{
RecordDetection = DateTime.UtcNow;
}
if (DateTime.UtcNow.Subtract(RecordDetection).Seconds < 1)
{
short[] pcm = BytesToShorts(e.Buffer, 0, e.BytesRecorded);
byte[] encoded = new byte[1000];
Encoder.Encode(pcm, 0, 960, encoded, 0, encoded.Length);
var voicePacket = new VoicePacket()
{
PacketAudio = encoded,
PacketDataIdentifier = PacketIdentifier.Audio,
PacketVersion = Network.Network.Version,
PacketBytesRecorded = e.BytesRecorded
};
VCClient.Send(voicePacket);
}
}
Then on the decode side I do this...
private Task VC_OnAudioReceived(byte[] Audio, string Key, float Volume, int BytesRecorded, float RotationSource)
{
_ = Task.Factory.StartNew(() =>
{
var participant = Participants.FirstOrDefault(x => x.LoginKey == Key);
if (participant != null)
{
try
{
short[] decoded = new short[960];
Decoder.Decode(Audio, 0, Audio.Length, decoded, 0, 960, false);
byte[] decodedBytes = ShortsToBytes(decoded, 0, decoded.Length);
participant.FloatProvider.Volume = Volume;
if (DirectionalAudio)
{
participant.MonoToStereo.LeftVolume = (float)(0.5 + Math.Sin(RotationSource) * 0.5);
participant.MonoToStereo.RightVolume = (float)(0.5 - Math.Sin(RotationSource) * 0.5);
}
participant.WaveProvider.AddSamples(decodedBytes, 0, BytesRecorded);
}
catch { }
}
});
if(!Stopping && AudioPlayer.PlaybackState == PlaybackState.Stopped)
AudioPlayer.Play();
return Task.CompletedTask;
}
participant variable consists of
public class ParticipantModel
{
public string Name { get; set; } = "";
public string LoginKey { get; set; } = "";
public BufferedWaveProvider WaveProvider { get; set; }
public Wave16ToFloatProvider FloatProvider { get; set; }
public MonoToStereoSampleProvider MonoToStereo { get; set; }
}
and is created like so...
case VCSignalling_Packet.PacketIdentifier.Login:
var participant = new ParticipantModel()
{
LoginKey = packet.PacketLoginKey,
Name = packet.PacketName,
WaveProvider = new BufferedWaveProvider(VoipService.GetRecordFormat) { DiscardOnBufferOverflow = true }
};
participant.FloatProvider = new Wave16ToFloatProvider(participant.WaveProvider);
participant.MonoToStereo = new NAudio.Wave.SampleProviders.MonoToStereoSampleProvider(participant.FloatProvider.ToSampleProvider());
OnParticipantLogin?.Invoke(participant);
break;
Record and Playback formats
public static WaveFormat GetRecordFormat { get => new WaveFormat(SampleRate, 16, Channels); }
public static WaveFormat GetAudioFormat { get => WaveFormat.CreateIeeeFloatWaveFormat(SampleRate, Channels * 2); }
Opus encoder and decoder initializations.
Encoder = new OpusEncoder(SampleRate, Channels, Concentus.Enums.OpusApplication.OPUS_APPLICATION_VOIP);
Encoder.Complexity = 0;
Encoder.UseVBR = true;
Decoder = new OpusDecoder(SampleRate, Channels);
Mixer, Recorder and AudioPlayer
Mixer = new MixingSampleProvider(GetAudioFormat);
Normalizer = new SoftLimiter(Mixer);
Normalizer.Boost.CurrentValue = 5;
AudioRecorder = audioManager.CreateRecorder(GetRecordFormat);
AudioPlayer = audioManager.CreatePlayer(Normalizer);
Softlimiter class taken and implemented from https://www.markheath.net/post/limit-audio-naudio
I have isolated that the problem is the codec. When doing everything without compressing the audio. Everything is clear and working fine with 3-5+ people talking at the same time. As soon as the codec or A codec is implemented then the mixing of audio with 2 people talking at the same time just gets distorted and crackly. (I've tried to use g722 codec before).
I know there is some loss going on with decoder but I did not expect it to fully affect mixing. What is going on and what can I do to fix it? Would I have to do audio cleaning or is there something I am missing?
英文:
I have made a simple VOIP group chat that works perfectly when there is no encoder/decoder however as soon as I implement a codec and have it mixing multiple decoded audio streams then there is a ton of noise. If it's just 1 on 1 talking then there is no noise.
This is what I tried.
Currently I am recording at 48khz 40ms 16bit Mono audio and encoding with the concentus lib. Code Below... (BytesToShorts and ShortsToBytes methods are direct copies from the demo of the concentus lib)
private void AudioDataAvailable(object? sender, WaveInEventArgs e)
{
if (IsDeafened || IsMuted)
return;
float max = 0;
// interpret as 16 bit audio
for (int index = 0; index < e.BytesRecorded; index += 2)
{
short sample = (short)((e.Buffer[index + 1] << 8) |
e.Buffer[index + 0]);
// to floating point
var sample32 = sample / 32768f;
// absolute value
if (sample32 < 0) sample32 = -sample32;
if (sample32 > max) max = sample32;
}
if (max > 0.08)
{
RecordDetection = DateTime.UtcNow;
}
if (DateTime.UtcNow.Subtract(RecordDetection).Seconds < 1)
{
short[] pcm = BytesToShorts(e.Buffer, 0, e.BytesRecorded);
byte[] encoded = new byte[1000];
Encoder.Encode(pcm, 0, 960, encoded, 0, encoded.Length);
var voicePacket = new VoicePacket()
{
PacketAudio = encoded,
PacketDataIdentifier = PacketIdentifier.Audio,
PacketVersion = Network.Network.Version,
PacketBytesRecorded = e.BytesRecorded
};
VCClient.Send(voicePacket);
}
}
Then on the decode side I do this...
private Task VC_OnAudioReceived(byte[] Audio, string Key, float Volume, int BytesRecorded, float RotationSource)
{
_ = Task.Factory.StartNew(() =>
{
var participant = Participants.FirstOrDefault(x => x.LoginKey == Key);
if (participant != null)
{
try
{
short[] decoded = new short[960];
Decoder.Decode(Audio, 0, Audio.Length, decoded, 0, 960, false);
byte[] decodedBytes = ShortsToBytes(decoded, 0, decoded.Length);
participant.FloatProvider.Volume = Volume;
if (DirectionalAudio)
{
participant.MonoToStereo.LeftVolume = (float)(0.5 + Math.Sin(RotationSource) * 0.5);
participant.MonoToStereo.RightVolume = (float)(0.5 - Math.Sin(RotationSource) * 0.5);
}
participant.WaveProvider.AddSamples(decodedBytes, 0, BytesRecorded);
}
catch { }
}
});
if(!Stopping && AudioPlayer.PlaybackState == PlaybackState.Stopped)
AudioPlayer.Play();
return Task.CompletedTask;
}
participant variable consists of
public class ParticipantModel
{
public string Name { get; set; } = "";
public string LoginKey { get; set; } = "";
public BufferedWaveProvider WaveProvider { get; set; }
public Wave16ToFloatProvider FloatProvider { get; set; }
public MonoToStereoSampleProvider MonoToStereo { get; set; }
}
and is created like so...
case VCSignalling_Packet.PacketIdentifier.Login:
var participant = new ParticipantModel()
{
LoginKey = packet.PacketLoginKey,
Name = packet.PacketName,
WaveProvider = new BufferedWaveProvider(VoipService.GetRecordFormat) { DiscardOnBufferOverflow = true }
};
participant.FloatProvider = new Wave16ToFloatProvider(participant.WaveProvider);
participant.MonoToStereo = new NAudio.Wave.SampleProviders.MonoToStereoSampleProvider(participant.FloatProvider.ToSampleProvider());
OnParticipantLogin?.Invoke(participant);
break;
Record and Playback formats
public static WaveFormat GetRecordFormat { get => new WaveFormat(SampleRate, 16, Channels); }
public static WaveFormat GetAudioFormat { get => WaveFormat.CreateIeeeFloatWaveFormat(SampleRate, Channels * 2); }
Opus encoder and decoder initializations.
Encoder = new OpusEncoder(SampleRate, Channels, Concentus.Enums.OpusApplication.OPUS_APPLICATION_VOIP);
Encoder.Complexity = 0;
Encoder.UseVBR = true;
Decoder = new OpusDecoder(SampleRate, Channels);
Mixer, Recorder and AudioPlayer
Mixer = new MixingSampleProvider(GetAudioFormat);
Normalizer = new SoftLimiter(Mixer);
Normalizer.Boost.CurrentValue = 5;
AudioRecorder = audioManager.CreateRecorder(GetRecordFormat);
AudioPlayer = audioManager.CreatePlayer(Normalizer);
Softlimiter class taken and implemented from https://www.markheath.net/post/limit-audio-naudio
I have isolated that the problem is the codec. When doing everything without compressing the audio. Everything is clear and working fine with 3-5+ people talking at the same time. As soon as the codec or A codec is implemented then the mixing of audio with 2 people talking at the same time just gets distorted and crackly. (I've tried to use g722 codec before).
I know there is some loss going on with decoder but I did not expect it to fully affect mixing. What is going on and what can I do to fix it? Would I have to do audio cleaning or is there something I am missing?
答案1
得分: 0
I guess it's one of those moments again where you ask a question then a day later you try something and it works...
Alright, so I figured out why there are tons of distortions when there are 2 people talking at the same time with an audio codec in the middle. What I had to do was change it so there is a decoder per connected client on the client side. So what I did was modify both snippets below...
public class ParticipantModel
{
public string Name { get; set; } = "";
public string LoginKey { get; set; } = "";
public BufferedWaveProvider WaveProvider { get; set; }
public Wave16ToFloatProvider FloatProvider { get; set; }
public MonoToStereoSampleProvider MonoToStereo { get; set; }
public OpusDecoder Decoder { get; set; }
}
private Task VC_OnAudioReceived(byte[] Audio, string Key, float Volume, int BytesRecorded, float RotationSource)
{
_ = Task.Factory.StartNew(() =>
{
var participant = Participants.FirstOrDefault(x => x.LoginKey == Key);
if (participant != null)
{
try
{
short[] decoded = new short[960];
participant.Decoder.Decode(Audio, 0, Audio.Length, decoded, 0, decoded.Length, false);
byte[] decodedBytes = ShortsToBytes(decoded, 0, decoded.Length);
participant.FloatProvider.Volume = Volume;
if (DirectionalAudio)
{
participant.MonoToStereo.LeftVolume = (float)(0.5 + Math.Sin(RotationSource) * 0.5);
participant.MonoToStereo.RightVolume = (float)(0.5 - Math.Sin(RotationSource) * 0.5);
}
participant.WaveProvider.AddSamples(decodedBytes, 0, BytesRecorded);
}
catch { }
}
});
}
After this, I came to the conclusion that audio codecs work on a per stream basis and do not independently decode each individual sample.
英文:
I guess it's one of those moments again where you ask a question then a day later you try something and it works...
Alright so I figured out why there are tons of distortions when there are 2 people talking at the same time with an audio codec in the middle. What I had to do was change it so there is a decoder per connected client on the client side. So what I did was modify both snippets below...
public class ParticipantModel
{
public string Name { get; set; } = "";
public string LoginKey { get; set; } = "";
public BufferedWaveProvider WaveProvider { get; set; }
public Wave16ToFloatProvider FloatProvider { get; set; }
public MonoToStereoSampleProvider MonoToStereo { get; set; }
public OpusDecoder Decoder { get; set; }
}
private Task VC_OnAudioReceived(byte[] Audio, string Key, float Volume, int BytesRecorded, float RotationSource)
{
_ = Task.Factory.StartNew(() =>
{
var participant = Participants.FirstOrDefault(x => x.LoginKey == Key);
if (participant != null)
{
try
{
short[] decoded = new short[960];
participant.Decoder.Decode(Audio, 0, Audio.Length, decoded, 0, decoded.Length, false);
byte[] decodedBytes = ShortsToBytes(decoded, 0, decoded.Length);
participant.FloatProvider.Volume = Volume;
if (DirectionalAudio)
{
participant.MonoToStereo.LeftVolume = (float)(0.5 + Math.Sin(RotationSource) * 0.5);
participant.MonoToStereo.RightVolume = (float)(0.5 - Math.Sin(RotationSource) * 0.5);
}
participant.WaveProvider.AddSamples(decodedBytes, 0, BytesRecorded);
}
catch { }
}
});
After this I came to the conclusion that audio codecs work on a per stream basis and do not independently decode each individual sample.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论