英文:
What exactly happens to large amplitude values when we load a .wav file using librosa?
问题
我想了解使用librosa
加载.wav
文件时,大振幅值会发生什么。
当我使用librosa
查看波形时,尝试了解.wav
文件的振幅值。现在,我想看看缩放这些振幅值如何影响声音。因此,我将这些振幅值乘以一个缩放因子。然而,当我使用IPython.display.Audio
播放时,我没有看到声音上的任何效果:
scaled_signal = signal * 10 # signal是原始样本
# 播放缩放后的信号
print('播放缩放后的样本:')
display(Audio(data=scaled_signal, rate=sr))
因此,我将文件保存到了我的电脑上,我能听到差异。振幅确实被缩放了。然后,我决定再次使用librosa
重新加载此文件。令人惊讶的是,现在当我在我的jupyter-notebook
中再次播放此文件时,我能够听到缩放的效果:
soundfile.write('scaled_signal.wav', scaled_signal, sr)
# 再次加载缩放后的信号
scaled_signal, sr = librosa.load('scaled_signal.wav', sr=sr)
print('重新加载的缩放样本')
display(Audio(data=scaled_signal, rate=sr))
然而,通过绘制波形(见下文),我可以看到其形状已发生了变化。帮助我理解发生了什么以及为什么会这样?似乎它对振幅的幅度应用了一个上限。
fig, axs = plt.subplots(1, 2)
fig.set_figwidth(18)
waveshow(signal, sr=sr, ax=axs[0])
waveshow(scaled_signal, sr=sr, ax=axs[1])
英文:
I want to understand what happens to large amplitude values of a .wav
file, when I load them using librosa
.
I was trying to understand the amplitude values of .wav
files when I see the waveform using librosa
. Now, I want to see how scaling these values of amplitude affects the sound. Hence, I multiplied the values with a scaling factor. However, when I played that using IPython.display.Audio
, I was not able to see any effect on the sound:
scaled_signal = signal * 10 # signal is the original sample
# play the scaled signal
print('Play the scaled sample:')
display(Audio(data = scaled_signal, rate = sr))
So I saved the file to my PC and I could hear the difference. The amplitude was indeed scaled. Then, I decided to reload this file using librosa
. Surprisingly, now when I played this file again in my jupyter-notebook
, I was able to hear the effect of scaling:
soundfile.write('scaled_signal.wav', scaled_signal, sr)
# loading the scaled signal again
scaled_signal, sr = librosa.load('scaled_signal.wav', sr = sr)
print('The scaled sample loaded again')
display(Audio(data = scaled_signal, rate = sr))
However, on plotting the waveform (see below) I could see that its shape has changed. Help me understand what happened and why? It appears as if it applied an upper_bound on magnitude of amplitudes.
fig, axs = plt.subplots(1, 2)
fig.set_figwidth(18)
waveshow(signal, sr = sr, ax = axs[0])
waveshow(scaled_signal, sr = sr, ax = axs[1])
答案1
得分: 0
librosa.load()
不会应用任何数据相关的标准化/缩放。只会将 int16/32 格式映射到 0.0-1.0 范围。
从用于播放音频的 IPython.display.Audio 的文档中:
如果使用数组选项,波形将被标准化。
英文:
librosa.load()
does not apply any data-dependent normalization/scaling. Only mapping between int16/32 formats to a 0.0-1.0 range.
From the documentation for IPython.display.Audio, which you are using to play back the audio:
> If the array option is used the waveform will be normalized.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论