英文:
Python/Audio Classification - Split audio file based on repetition
问题
我正在创建一个用于动物声音分类的音频模型。这只是一个业余项目,我只是想让自己熟悉这些技术。我遇到的问题是我的音频片段的持续时间差异以及我应该如何将它们剪切成相似的持续时间长度。问题不在于如何剪切音频文件(因为我找到了许多关于如何分割音频文件的示例),而在于持续时间本身。
我的文件中有一些静音部分,但主要也包括很多重复的声音,因为数据集主要是昆虫声音。而昆虫,比如蟋蟀,会发出类似的声音,重复的声音,很长一段时间。所以我的想法是:如果有一种方法可以检测音频文件中的重复部分,就可以使用它来分割音频文件。然后查看最长片段的持续时间,并将其用作剪切所有音频文件的持续时间。
但也许我想得不对。有人有任何建议或对我有用的文献吗?
英文:
I'm creating a audio classification model for animal sounds. It's a hobby project, just to get myself familiarized with the techniques. The thing that I'm struggling with is the duration differences of my audio clips and how I should cut them into similar duration lengths. It is not so much on the how (because I found many examples on how to split the audio files) but my question is about the duration itself.
My files have some silences but mainly also a lot of repetitive sounds as the dataset is mainly insects. And the insect, like a cricket will make a similar sound, repetitive sound, for a long time. So my idea was: if there is a way to detect repetitions in audio files, use that to split the audio file. And then see what the duration is of the longest clip, and use that as a duration to cut split all the audio files.
But maybe I'm thinking about it all wrong. Does anybody have any suggestions or nice literature for me?
答案1
得分: 1
最近我自己对昆虫的声音进行了分类(蚱蜢、蝉等),我可以告诉你,你可能需要不同大小的音频块。我曾尝试过0.5秒到60秒之间的大小,它们都显示出具有有价值信息的特定模式。
为了获得更好的结果,我做了两件事:首先,我将较长的时间窗口与较短的焦点时间窗口相结合。示例1显示了一个60秒的长时间窗口(上部)与0.6秒的焦点窗口。在示例2中,我将40秒的长时间窗口与四个2秒的焦点窗口相结合。
对于所有不同的时间窗口,最后一步可以采用集成方法,如投票,以提高结果。
英文:
As I have done a classification of insects sounds myself recently (grasshoppers, cicada etc.,) I can tell that you would probably need audio chunks of various sizes. I had experimented with sizes between 0.5 and 60 seconds, and they all show specific patterns that bear valuable information.
To get better results I did two things: First I combined a longer time window with a short focus time window. Example 1 shows the spectrogram of a long time window of 60 secs (upper part) with a focus window of 0.6 seconds. In Example 2 I have combined a long time window of 40 secs with four focus windows of 2 secs.
A final step can be done for all of the different time windows: You can use an ensemble method, such as voting, to improve the results.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论