英文:
Is there a Kotlin or Java lib for listening for audio commands? (Want to trigger a Halloween display when kids yell 'TRICK OR TREAT')
问题
目标:
- 在低端设备(树莓派 3)上
- 监听固定的一组音频短语命令(类似于我的版本的“嘿 Google”或“嘿 Siri”)
- 可以是非常有限的词汇(少于 10 个命令)
- 当检测到命令时触发 Kotlin 函数。
- 在不使用大量 CPU 或大量网络带宽的情况下进行。
据我所知,现代边缘设备(Echo、智能手机、Google Home 等)具有非常先进的硬件和软件解决方案,使它们能够持续监听关键词,而不会占用大量 CPU,并且无需将所有音频发送到云服务器。我想要实现相同的功能,但我不确定是否可能 - 我相信他们训练了精简高效的“嘿 Siri”机器学习模型,以处理各种口音、音量、语调、年龄、背景噪音等。
- Java 语音 API(JSAPI)似乎有些不确定。许多示例都很旧,要么指向不受支持的库,要么最终使用了 Google Cloud Speech。
- 这不一定必须是一个 Java/Kotlin 库,我也可以包装一个本地的命令监听进程。
- 我正在看 ML Kit 和 Firebase ML,但没有看到音频到命令的转换。
- 最好能够调整灵敏度,戴着化妆面具的小孩喊“TWIC R TREET”或“TMURMP... TWEEF”之类的声音也应该能触发它。
- 但不是一个纯音量检测器,经过的汽车不应该触发它。
有什么建议吗?或者对于树莓派来说,这样的要求是否不合理?
英文:
Goal:
- On a low-end device (raspberry pi 3)
- Listen for a fixed set of audio phrase commands (my version of 'Hey Google' or 'Hey Siri')
- That can be a very constrained vocabulary (less than 10 commands)
- Trigger a Kotlin function when the command is detected.
- Without using a ton of CPU, or a ton of network bandwidth.
AFAIK modern edge devices (Echo, Smart Phones, Google Home, etc) have very fancy hardware+software solutions that allow them to continuously listen for keywords without sucking up a ton of CPU, and without having to send all audio up to a cloud server. I'd like to have the same, but am not sure if it is even possible - I'm sure they trained their minimal and efficient 'Hey Siri' ML model to handle all sorts of accents, volumes, cadence, ages, background noise, etc.
- The Java Speech API (JSAPI) seems... iffy. Many of the examples are old, and either point to unsupported libs, or ended up using Google Cloud Speech.
- This doesn't have to be a Java/Kotlin library, I could also wrap a native command listener process.
- I'm looking at ML Kit and Firebase ML, but didn't see audio to command conversion.
- It would be best if I could tune the sensitivity, small children wearing costume masks yell 'TWIC R TREET' or 'TMURMP... TWEEF' or whatever should still do it.
- ... But not a pure volume detector, a car driving by shouldn't trigger it.
Any suggestions? Or is this unreasonable to ask of an rpi?
答案1
得分: 2
你可以使用类似CMU Sphinx的库,该库可以离线工作,无需其他在线服务器。
有时识别的结果相当不准确。为了解决这个问题,我使用了比默认提供的字典小得多的字典。虽然我没有在树莓派上进行全面测试,但我认为它应该能够工作。
英文:
You could use a library like CMU Sphinx which works offline, not requiring other online servers.
Sometimes the recognized results are quite inaccurate. To solve that problem I used a much smaller dictionary than the default provided one. I never fully tested it on a Raspberry Pi but I think it should work.
答案2
得分: 1
是的,有一个非常有用的库,我推荐使用:https://cmusphinx.github.io/
英文:
Yes there is a quite useful library that I recommend: https://cmusphinx.github.io/
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论