英文:
Understanding mel-scaled spectrogram for a simple sine wave
问题
I generate a simple sine wave with a frequency of 100 and calculate an FFT to check that the obtained frequency is correct.
然后我生成一个频率为100的简单正弦波,并计算FFT以检查得到的频率是否正确。
Then I calculate melspectrogram
but do not understand what its output means? where do I see the frequency 100 in this output? Why is the yellow bar located in the 25th area?
然后我计算melspectrogram
,但不明白它的输出是什么意思?我在这个输出中如何看到频率100?为什么黄色条在第25区域?
If I change the frequency to 200, melspectrogram
it gives me this:
如果我将频率更改为200,melspectrogram
会给我这个:
Why is the yellow bar in the 50 area?
为什么黄色条在第50区域?
英文:
I generate a simple sine wave with a frequency of 100 and calculate an FFT to check that the obtained frequency is correct.
Then I calculate melspectrogram
but do not understand what its output means? where do I see the frequency 100 in this output? Why is the yellow bar located in the 25th area?
# In[4]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.fft
import librosa
def generate_sine_wave(freq, sample_rate, duration)-> tuple[np.ndarray, np.ndarray]:
x = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)
frequencies = x * freq
# 2pi because np.sin takes radians
y = np.sin(2 * np.pi * frequencies)
return x, y
sample_rate = 1024
freq = 100
x, y = generate_sine_wave(freq, sample_rate, 2)
plt.figure(figsize=(10, 4))
plt.plot(x, y)
plt.grid(True)
fft = scipy.fft.fft(y)
fft = fft[0 : len(fft) // 2]
fft = np.abs(fft)
xs = np.linspace(0, sample_rate // 2, len(fft))
plt.figure(figsize=(15, 4))
plt.plot(xs, fft)
plt.grid(True)
melsp = librosa.feature.melspectrogram(sr=sample_rate, y=y)
melsp = melsp.T
plt.matshow(melsp)
plt.title('melspectrogram')
max = np.max(melsp)
print('melsp.shape =', melsp.shape)
print('melsp max =', max)
If I change the frequency to 200, melspectrogram
it gives me this:
Why is the yellow bar in the 50 area?
答案1
得分: 2
librosa的melspectrogram函数计算梅尔标度的频谱图。这与通常的线性标度频谱图相同,但频率轴重新采样为扭曲的mel标度。
将特定频率箱("为什么是25?")与Hz中的频率相关联是复杂但可行的:
melspectrogram
将频率范围[0, sr/2]映射到mel空间。在您的示例中,[0, 512] Hz映射到mel范围为0到7.68(等于librosa.hz_to_mel(512)
)。- 该范围默认均匀分为128个箱。第i个mel箱中心对应于
librosa.mel_to_hz(i * 7.68 / 127)
。
然后,对于特定的25和50箱,我们可以验证它们对应于预期的频率:
librosa.mel_to_hz(25 * 7.68 / 127) = 100.7874
librosa.mel_to_hz(50 * 7.68 / 127) = 201.5748
对于绘图,melspectrogram文档建议使用librosa.display.specshow来显示梅尔标度的频谱图,选项为y_axis='mel'
,如下所示:
fig, ax = plt.subplots()
S_dB = librosa.power_to_db(S, ref=np.max)
img = librosa.display.specshow(S_dB, x_axis='time',
y_axis='mel', sr=sr,
fmax=8000, ax=ax)
fig.colorbar(img, ax=ax, format='%+2.0f dB')
ax.set(title='Mel-frequency spectrogram')
这将绘制梅尔频谱图,y轴标记为Hz,但正确扭曲为mel标度。
英文:
librosa's melspectrogram function computes a mel-scaled spectrogram. This is the same as the usual linear-scale spectrogram, but with the frequency axis resampled to a warped mel scale.
Relating a particular bin ("why 25?") to frequency in Hz is complicated but doable:
melspectrogram
maps frequency range [0, sr/2] to mel space. In your example, [0, 512] Hz maps to mel in the range 0 to 7.68 (=librosa.hz_to_mel(512)
).- The range is uniformly divided into 128 bins (by default). The ith mel bin center corresponds to
librosa.mel_to_hz(i * 7.68 / 127)
.
Then for bins 25 and 50 in particular, we can verify that they correspond to the expected frequencies:
librosa.mel_to_hz(25 * 7.68 / 127) = 100.7874
librosa.mel_to_hz(50 * 7.68 / 127) = 201.5748
For plotting, the melspectrogram documentation suggests displaying mel-scale specrograms using librosa.display.specshow with the option y_axis='mel'
, like:
fig, ax = plt.subplots()
S_dB = librosa.power_to_db(S, ref=np.max)
img = librosa.display.specshow(S_dB, x_axis='time',
y_axis='mel', sr=sr,
fmax=8000, ax=ax)
fig.colorbar(img, ax=ax, format='%+2.0f dB')
ax.set(title='Mel-frequency spectrogram')
This plots the mel specrogram with the y axis labeled in Hz, but correctly warped for the mel scale.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论