使用C语言在Python代码中进行更快的傅立叶变换?

huangapple go评论71阅读模式
英文:

Using C within Python code for faster fourier transform?

问题

以下是您提供的代码的翻译部分:

我有一个函数用于计算传入音频流的傅立叶变换然后识别音频中最显著的音符一切都正常运行尽管我的代码可能在许多方面都有问题)。我想要解决的主要问题是函数的速度我使用的块大小为2^14这使得函数相当精确但速度比我想要的慢得多我想知道是否有办法实现一个能够更快运行的C fft函数但仍然能够在我的Python代码中运行我也愿意听取有关如何加快此函数运行速度的其他建议

以下是当前的函数

```python
def pitch_calculations(stream, CHUNK, RATE):
    # 读取麦克风流,并调用struct.unpack将二进制数据转换回浮点数
    data = stream.read(CHUNK, exception_on_overflow=False)
    dataInt = np.array(struct.unpack(str(CHUNK) + 'h', data))
    
    # 对输入数据应用窗口函数(Hamming)
    windowed_data = np.hamming(CHUNK) * dataInt

    # 使用numpy的快速傅立叶变换将麦克风数据转换为频率
    fft_result = np.abs(fft.fft(windowed_data, threads=6)) * 2 / (11000 * CHUNK)
    fft_result_copy = fft_result
    freqs = np.fft.fftfreq(len(windowed_data), d=1.0 / RATE)
    
    fft_result = fft_result[:int(len(fft_result)/2)]
    freqs = freqs[:int(len(freqs)/2)]
    
    # 使用scipy.signal find_peaks找到频率谱中的局部最大值的索引
    localmax_indices = find_peaks(fft_result, height=0.04)[0]
    
    # 获取局部最大值的幅度
    strong_freqs = fft_result[localmax_indices]
    
    # 按降序对幅度进行排序
    sorted_indices = np.argsort(strong_freqs)[::-1]
    
    # 从峰值计算音高,计算多少个cent偏差
    pitches = []
    
    for index in sorted_indices:
        pitches.append(abs(freqs[localmax_indices[index]]))
        
    note_list = [calculate_tuning(pitch)[0] for pitch in pitches]
    cent_list = [calculate_tuning(pitch)[1] for pitch in pitches]
    note_list = order_pitches(note_list)
    note_list = remove_duplicates(note_list)
    
    return note_list, cent_list

希望这能帮助您理解代码的翻译部分。如果您需要更多信息或有其他问题,请随时提出。

英文:

I have a function that computes the fourier transform of an incoming audio stream, and then identifies what musical notes are most dominant within the audio. Everything is working (though I'm sure my code may be janky in many ways). The main thing I'd like to fix is the speed of the function. I'm using a chunk size of 2^14, which makes the function quite accurate, but also way slower than I'd like. I was wondering if there is any way to implement a C fft function that could run faster, but still have it run within my Python code. I'm also open to any other suggestions on how to get this function to run faster.

Here is the current function:

def pitch_calculations(stream, CHUNK, RATE):
    # Read mic stream and then call struct.unpack to convert from binary data back to floats
    data = stream.read(CHUNK, exception_on_overflow=False)
    dataInt = np.array(struct.unpack(str(CHUNK) + 'h', data))
    
    # Apply a window function (Hamming) to the input data
    windowed_data = np.hamming(CHUNK) * dataInt

    # Using numpy fast Fourier transform to convert mic data into frequencies
    fft_result = np.abs(fft.fft(windowed_data, threads=6)) * 2 / (11000 * CHUNK)
    fft_result_copy = fft_result
    freqs = np.fft.fftfreq(len(windowed_data), d=1.0 / RATE)
    
    fft_result = fft_result[:int(len(fft_result)/2)]
    freqs = freqs[:int(len(freqs)/2)]
    
    # Find the indices of local maxima in the frequency spectrum using scipy.signal find_peaks
    localmax_indices = find_peaks(fft_result, height=0.04)[0]
    
    # Get the magnitudes of the local maxima
    strong_freqs = fft_result[localmax_indices]
    
    # Sort the magnitudes in descending order
    sorted_indices = np.argsort(strong_freqs)[::-1]
    
    # Calculate the pitches from the peaks, calculate how many cents sharp/flat
    pitches = []
    
    for index in sorted_indices:
        pitches.append(abs(freqs[localmax_indices[index]]))
        
    note_list = [calculate_tuning(pitch)[0] for pitch in pitches]
    cent_list = [calculate_tuning(pitch)[1] for pitch in pitches]
    note_list = order_pitches(note_list)
    note_list = remove_duplicates(note_list)
    
    
    return note_list, cent_list

答案1

得分: 2

正如其他人所说,NumPy已经使用优化的本机代码执行了底层FFT计算。

真正的问题在于,2^14的变换大小对于任何实现来说都是一个不小的成本。加速它的建议如下:

  • 降低音频采样率和/或变换大小,如果可能的话。对于确定音乐音符频率来说,2^14似乎非常高。可以尝试将音频降采样到,例如,24千赫兹(例如使用librosa.resample)并使用4096的变换大小,以获得每个频谱的频率分辨率为5.9赫兹。然后使用插值来估算子频带精度的峰值。
  • 使用float32而不是float64:在变换之前,使用windowed_data = windowed_data.astype(np.float32)来将数据转换为float32。
  • 使用实数到复数FFT,使用np.fft.rfft而不是标准的复数到复数变换。这可能会节省近2倍的计算时间。
  • 批量处理一批窗口,而不是逐个处理。特别是,有些情况下,批量FFT的计算效率比单独计算FFT更高。例如,使用np.fft.rfft(x, axis=0),其中x是一个2D数组,以计算x的每一列的实际到复数FFT。
英文:

As others have said, numpy already performs the underlying FFT computation with optimized native code.

The real problem is that a transform size of 2^14 is a nontrivial cost by any implementation. Suggestions to speed it up:

  • Reduce the audio sample rate and/or the transform size, if possible. 2^14 seems very high for determining musical note frequencies. It might be enough to downsample the audio to, say, 24 kHz (e.g. with librosa.resample) and use a transform size of 4096, for a spectrum with frequency resolution of 5.9 Hz per bin. Then use interpolation to estimate peaks with sub-bin percision.
  • Use float32 instead of float64: Before the transform, use windowed_data = windowed_data.astype(np.float32) to cast to float32.
  • Use real-to-complex FFTs with np.fft.rfft instead of the standard complex-to-complex transform. This might save near a factor 2 in computation time.
  • Process a batch of windows at a time, instead of one by one. Particularly, a batch of FFTs is in some cases more efficiently computed than computing FFTs individually. Use for instance np.fft.rfft(x, axis=0) with x being a 2D array to compute the real-to-complex FFT of each column of x.

huangapple
  • 本文由 发表于 2023年6月21日 23:38:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76525007.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定