如何在Java中从一个.WAV文件中识别声音频率

huangapple go评论81阅读模式
英文:

How to recognize the sound frequency from a .WAV file in Java

问题

以下是您要翻译的内容:

AudioInputStream stream = AudioSystem.getAudioInputStream(new File("file_a4.wav"));

我正在寻找一种方法来识别录制在.wav文件中的音乐音阶声音的频率例如A4 = 440赫兹)。我已经阅读了很多关于FFT的内容但有人建议音乐音阶上的频率与FFT不匹配

我也听说过DTFT我应该使用什么方法来识别音频文件中的频率
英文:
AudioInputStream stream = AudioSystem.getAudioInputStream(new File("file_a4.wav"));

I am looking for a way to recognise the frequency of a musical scale sound (e.g. A4 = 440 Hz) recorded on a .wav file. I have read a lot about FFT, but it has been suggested that the frequencies on the musical scale do not match the FFT.

I have also heard about DTFT. What should I use to recognise the frequency from a sound file?

答案1

得分: 1

根据您的问题,我理解您希望识别音频文件中乐器演奏的音符。如果是这样的话,有几种算法可以实现这个目标,您也可以训练神经网络来完成这项任务。需要注意的一些重要事项包括:

  1. 任何乐器(对人声产生的音乐音调也是如此)在演奏音符时都有其独特的“色彩”。这种色彩被称为音色(https://en.wikipedia.org/wiki/Timbre),它由在您听到特定音符时在其周围出现的谐波和非谐波频率组成。这就是为什么您不能仅仅寻找FFT的峰值来检测音乐音符的原因,也是为什么钢琴在演奏相同音符时与吉他的声音不同的原因。

  2. 对音频信号的分析通常通过对信号进行窗口化并计算窗口部分的DFT来进行。然后,每个窗口将产生自己的频谱,通过分析每个单独频谱和/或分析它们的相互作用,您(或例如您的卷积神经网络)将得出结论/结果。这个对信号进行窗口化并计算DFT的过程会产生频谱图(https://en.wikipedia.org/wiki/Spectrogram#:~:text=A%20spectrogram%20is%20a%20visual,sonographs%2C%20voiceprints%2C%20or%20voicegrams.)。

在这个简短的介绍之后,以下是一些在音频文件中识别单个音符的简单算法。您可以在互联网上找到这些算法的实现,以及许多其他算法。检测和弦产生的音符要更复杂一些,但可以使用其他算法或神经网络来完成。

  1. 使用自相关分析进行音高检测的方法:https://ieeexplore.ieee.org/document/1162905
  2. YIN算法:http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf
英文:

What I understand from your question is that you want to recognize the musical note/s an instrument is playing in a wav file. If that is the case, there are several algorithms for doing that, and you could always train a neural network for doing that too.
Some important Things to take into account are:

  1. Any instrument (the same would happen for the musical sounds produced by the human voice) has its own particular "color" when producing a note. This color is called the timbre (https://en.wikipedia.org/wiki/Timbre), and is composed by the harmonic and inharmonic frequencies that surround the frequency you psychoacoustically perceive when listening to that specific note. This is why you cannot just look for the peak of an FFT to detect the musical note, and it is also the reason why a piano sounds different than a guitar when playing the same note.

  2. The analysis of an audio signal is often performed by windowing the signal and calculating the DFT of the windowed part of the signal. Each window would then produce its own spectrum, and it s from the analysis of each individual spectrum and/or the analysis of how they interact that you (or your CNN, for example) will obtain your conclusions/results. This process of windowing the signal and calculating the DFTs produces a spectogram (https://en.wikipedia.org/wiki/Spectrogram#:~:text=A%20spectrogram%20is%20a%20visual,sonographs%2C%20voiceprints%2C%20or%20voicegrams.)

After that short introduction, here are some simple algorithms for identifying single notes in a wav file. You will be able to find implementations of those algorithms on the internet, and many others. The detection of the notes produced by chords is more complex but can be done with other algorithms or neural networks.

  1. On the use of autocorrelation analysis for pitch detection: https://ieeexplore.ieee.org/document/1162905
  2. YIN algorithm: http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf

huangapple
  • 本文由 发表于 2020年10月8日 17:53:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/64260047.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定