大振幅值在使用 librosa 加载 .wav 文件时会发生什么?

huangapple go评论55阅读模式
英文:

What exactly happens to large amplitude values when we load a .wav file using librosa?

问题

我想了解使用librosa加载.wav文件时,大振幅值会发生什么。

当我使用librosa查看波形时,尝试了解.wav文件的振幅值。现在,我想看看缩放这些振幅值如何影响声音。因此,我将这些振幅值乘以一个缩放因子。然而,当我使用IPython.display.Audio播放时,我没有看到声音上的任何效果:

scaled_signal = signal * 10 # signal是原始样本

# 播放缩放后的信号
print('播放缩放后的样本:')
display(Audio(data=scaled_signal, rate=sr))

因此,我将文件保存到了我的电脑上,我能听到差异。振幅确实被缩放了。然后,我决定再次使用librosa重新加载此文件。令人惊讶的是,现在当我在我的jupyter-notebook中再次播放此文件时,我能够听到缩放的效果:

soundfile.write('scaled_signal.wav', scaled_signal, sr)

# 再次加载缩放后的信号
scaled_signal, sr = librosa.load('scaled_signal.wav', sr=sr)

print('重新加载的缩放样本')
display(Audio(data=scaled_signal, rate=sr))

然而,通过绘制波形(见下文),我可以看到其形状已发生了变化。帮助我理解发生了什么以及为什么会这样?似乎它对振幅的幅度应用了一个上限。

fig, axs = plt.subplots(1, 2)
fig.set_figwidth(18)

waveshow(signal, sr=sr, ax=axs[0])
waveshow(scaled_signal, sr=sr, ax=axs[1])

原始信号和缩放信号的波形

英文:

I want to understand what happens to large amplitude values of a .wav file, when I load them using librosa.

I was trying to understand the amplitude values of .wav files when I see the waveform using librosa. Now, I want to see how scaling these values of amplitude affects the sound. Hence, I multiplied the values with a scaling factor. However, when I played that using IPython.display.Audio, I was not able to see any effect on the sound:

scaled_signal = signal * 10 # signal is the original sample

# play the scaled signal
print('Play the scaled sample:')
display(Audio(data = scaled_signal, rate = sr))

So I saved the file to my PC and I could hear the difference. The amplitude was indeed scaled. Then, I decided to reload this file using librosa. Surprisingly, now when I played this file again in my jupyter-notebook, I was able to hear the effect of scaling:

soundfile.write('scaled_signal.wav', scaled_signal, sr)

# loading the scaled signal again
scaled_signal, sr = librosa.load('scaled_signal.wav', sr = sr)

print('The scaled sample loaded again')
display(Audio(data = scaled_signal, rate = sr))

However, on plotting the waveform (see below) I could see that its shape has changed. Help me understand what happened and why? It appears as if it applied an upper_bound on magnitude of amplitudes.

fig, axs = plt.subplots(1, 2)
fig.set_figwidth(18)

waveshow(signal, sr = sr, ax = axs[0])
waveshow(scaled_signal, sr = sr, ax = axs[1])

The waveform of original signal and the scaled signal

答案1

得分: 0

librosa.load() 不会应用任何数据相关的标准化/缩放。只会将 int16/32 格式映射到 0.0-1.0 范围。

从用于播放音频的 IPython.display.Audio 的文档中:

如果使用数组选项,波形将被标准化。

英文:

librosa.load() does not apply any data-dependent normalization/scaling. Only mapping between int16/32 formats to a 0.0-1.0 range.

From the documentation for IPython.display.Audio, which you are using to play back the audio:

> If the array option is used the waveform will be normalized.

huangapple
  • 本文由 发表于 2023年7月12日 22:28:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76671666.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定