英文:
Using C within Python code for faster fourier transform?
问题
以下是您提供的代码的翻译部分:
我有一个函数,用于计算传入音频流的傅立叶变换,然后识别音频中最显著的音符。一切都正常运行(尽管我的代码可能在许多方面都有问题)。我想要解决的主要问题是函数的速度。我使用的块大小为2^14,这使得函数相当精确,但速度比我想要的慢得多。我想知道是否有办法实现一个能够更快运行的C fft函数,但仍然能够在我的Python代码中运行。我也愿意听取有关如何加快此函数运行速度的其他建议。
以下是当前的函数:
```python
def pitch_calculations(stream, CHUNK, RATE):
# 读取麦克风流,并调用struct.unpack将二进制数据转换回浮点数
data = stream.read(CHUNK, exception_on_overflow=False)
dataInt = np.array(struct.unpack(str(CHUNK) + 'h', data))
# 对输入数据应用窗口函数(Hamming)
windowed_data = np.hamming(CHUNK) * dataInt
# 使用numpy的快速傅立叶变换将麦克风数据转换为频率
fft_result = np.abs(fft.fft(windowed_data, threads=6)) * 2 / (11000 * CHUNK)
fft_result_copy = fft_result
freqs = np.fft.fftfreq(len(windowed_data), d=1.0 / RATE)
fft_result = fft_result[:int(len(fft_result)/2)]
freqs = freqs[:int(len(freqs)/2)]
# 使用scipy.signal find_peaks找到频率谱中的局部最大值的索引
localmax_indices = find_peaks(fft_result, height=0.04)[0]
# 获取局部最大值的幅度
strong_freqs = fft_result[localmax_indices]
# 按降序对幅度进行排序
sorted_indices = np.argsort(strong_freqs)[::-1]
# 从峰值计算音高,计算多少个cent偏差
pitches = []
for index in sorted_indices:
pitches.append(abs(freqs[localmax_indices[index]]))
note_list = [calculate_tuning(pitch)[0] for pitch in pitches]
cent_list = [calculate_tuning(pitch)[1] for pitch in pitches]
note_list = order_pitches(note_list)
note_list = remove_duplicates(note_list)
return note_list, cent_list
希望这能帮助您理解代码的翻译部分。如果您需要更多信息或有其他问题,请随时提出。
英文:
I have a function that computes the fourier transform of an incoming audio stream, and then identifies what musical notes are most dominant within the audio. Everything is working (though I'm sure my code may be janky in many ways). The main thing I'd like to fix is the speed of the function. I'm using a chunk size of 2^14, which makes the function quite accurate, but also way slower than I'd like. I was wondering if there is any way to implement a C fft function that could run faster, but still have it run within my Python code. I'm also open to any other suggestions on how to get this function to run faster.
Here is the current function:
def pitch_calculations(stream, CHUNK, RATE):
# Read mic stream and then call struct.unpack to convert from binary data back to floats
data = stream.read(CHUNK, exception_on_overflow=False)
dataInt = np.array(struct.unpack(str(CHUNK) + 'h', data))
# Apply a window function (Hamming) to the input data
windowed_data = np.hamming(CHUNK) * dataInt
# Using numpy fast Fourier transform to convert mic data into frequencies
fft_result = np.abs(fft.fft(windowed_data, threads=6)) * 2 / (11000 * CHUNK)
fft_result_copy = fft_result
freqs = np.fft.fftfreq(len(windowed_data), d=1.0 / RATE)
fft_result = fft_result[:int(len(fft_result)/2)]
freqs = freqs[:int(len(freqs)/2)]
# Find the indices of local maxima in the frequency spectrum using scipy.signal find_peaks
localmax_indices = find_peaks(fft_result, height=0.04)[0]
# Get the magnitudes of the local maxima
strong_freqs = fft_result[localmax_indices]
# Sort the magnitudes in descending order
sorted_indices = np.argsort(strong_freqs)[::-1]
# Calculate the pitches from the peaks, calculate how many cents sharp/flat
pitches = []
for index in sorted_indices:
pitches.append(abs(freqs[localmax_indices[index]]))
note_list = [calculate_tuning(pitch)[0] for pitch in pitches]
cent_list = [calculate_tuning(pitch)[1] for pitch in pitches]
note_list = order_pitches(note_list)
note_list = remove_duplicates(note_list)
return note_list, cent_list
答案1
得分: 2
正如其他人所说,NumPy已经使用优化的本机代码执行了底层FFT计算。
真正的问题在于,2^14的变换大小对于任何实现来说都是一个不小的成本。加速它的建议如下:
- 降低音频采样率和/或变换大小,如果可能的话。对于确定音乐音符频率来说,2^14似乎非常高。可以尝试将音频降采样到,例如,24千赫兹(例如使用librosa.resample)并使用4096的变换大小,以获得每个频谱的频率分辨率为5.9赫兹。然后使用插值来估算子频带精度的峰值。
- 使用float32而不是float64:在变换之前,使用
windowed_data = windowed_data.astype(np.float32)
来将数据转换为float32。 - 使用实数到复数FFT,使用np.fft.rfft而不是标准的复数到复数变换。这可能会节省近2倍的计算时间。
- 批量处理一批窗口,而不是逐个处理。特别是,有些情况下,批量FFT的计算效率比单独计算FFT更高。例如,使用
np.fft.rfft(x, axis=0)
,其中x
是一个2D数组,以计算x
的每一列的实际到复数FFT。
英文:
As others have said, numpy already performs the underlying FFT computation with optimized native code.
The real problem is that a transform size of 2^14 is a nontrivial cost by any implementation. Suggestions to speed it up:
- Reduce the audio sample rate and/or the transform size, if possible. 2^14 seems very high for determining musical note frequencies. It might be enough to downsample the audio to, say, 24 kHz (e.g. with librosa.resample) and use a transform size of 4096, for a spectrum with frequency resolution of 5.9 Hz per bin. Then use interpolation to estimate peaks with sub-bin percision.
- Use float32 instead of float64: Before the transform, use
windowed_data = windowed_data.astype(np.float32)
to cast to float32. - Use real-to-complex FFTs with np.fft.rfft instead of the standard complex-to-complex transform. This might save near a factor 2 in computation time.
- Process a batch of windows at a time, instead of one by one. Particularly, a batch of FFTs is in some cases more efficiently computed than computing FFTs individually. Use for instance
np.fft.rfft(x, axis=0)
withx
being a 2D array to compute the real-to-complex FFT of each column ofx
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论