Is there a Kotlin or Java lib for listening for audio commands? (Want to trigger a Halloween display when kids yell 'TRICK OR TREAT')

huangapple go评论68阅读模式
英文:

Is there a Kotlin or Java lib for listening for audio commands? (Want to trigger a Halloween display when kids yell 'TRICK OR TREAT')

问题

目标:

  1. 在低端设备(树莓派 3)上
  2. 监听固定的一组音频短语命令(类似于我的版本的“嘿 Google”或“嘿 Siri”)
  3. 可以是非常有限的词汇(少于 10 个命令)
  4. 当检测到命令时触发 Kotlin 函数。
  5. 在不使用大量 CPU 或大量网络带宽的情况下进行。

据我所知,现代边缘设备(Echo、智能手机、Google Home 等)具有非常先进的硬件和软件解决方案,使它们能够持续监听关键词,而不会占用大量 CPU,并且无需将所有音频发送到云服务器。我想要实现相同的功能,但我不确定是否可能 - 我相信他们训练了精简高效的“嘿 Siri”机器学习模型,以处理各种口音、音量、语调、年龄、背景噪音等。

  • Java 语音 API(JSAPI)似乎有些不确定。许多示例都很旧,要么指向不受支持的库,要么最终使用了 Google Cloud Speech。
  • 这不一定必须是一个 Java/Kotlin 库,我也可以包装一个本地的命令监听进程。
  • 我正在看 ML Kit 和 Firebase ML,但没有看到音频到命令的转换。
  • 最好能够调整灵敏度,戴着化妆面具的小孩喊“TWIC R TREET”或“TMURMP... TWEEF”之类的声音也应该能触发它。
  • 但不是一个纯音量检测器,经过的汽车不应该触发它。

有什么建议吗?或者对于树莓派来说,这样的要求是否不合理?

英文:

Goal:

  1. On a low-end device (raspberry pi 3)
  2. Listen for a fixed set of audio phrase commands (my version of 'Hey Google' or 'Hey Siri')
  3. That can be a very constrained vocabulary (less than 10 commands)
  4. Trigger a Kotlin function when the command is detected.
  5. Without using a ton of CPU, or a ton of network bandwidth.

AFAIK modern edge devices (Echo, Smart Phones, Google Home, etc) have very fancy hardware+software solutions that allow them to continuously listen for keywords without sucking up a ton of CPU, and without having to send all audio up to a cloud server. I'd like to have the same, but am not sure if it is even possible - I'm sure they trained their minimal and efficient 'Hey Siri' ML model to handle all sorts of accents, volumes, cadence, ages, background noise, etc.

  • The Java Speech API (JSAPI) seems... iffy. Many of the examples are old, and either point to unsupported libs, or ended up using Google Cloud Speech.
  • This doesn't have to be a Java/Kotlin library, I could also wrap a native command listener process.
  • I'm looking at ML Kit and Firebase ML, but didn't see audio to command conversion.
  • It would be best if I could tune the sensitivity, small children wearing costume masks yell 'TWIC R TREET' or 'TMURMP... TWEEF' or whatever should still do it.
  • ... But not a pure volume detector, a car driving by shouldn't trigger it.

Any suggestions? Or is this unreasonable to ask of an rpi?

答案1

得分: 2

你可以使用类似CMU Sphinx的库,该库可以离线工作,无需其他在线服务器。
有时识别的结果相当不准确。为了解决这个问题,我使用了比默认提供的字典小得多的字典。虽然我没有在树莓派上进行全面测试,但我认为它应该能够工作。

英文:

You could use a library like CMU Sphinx which works offline, not requiring other online servers.
Sometimes the recognized results are quite inaccurate. To solve that problem I used a much smaller dictionary than the default provided one. I never fully tested it on a Raspberry Pi but I think it should work.

答案2

得分: 1

是的,有一个非常有用的库,我推荐使用:https://cmusphinx.github.io/

英文:

Yes there is a quite useful library that I recommend: https://cmusphinx.github.io/

huangapple
  • 本文由 发表于 2020年9月12日 04:31:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/63853946.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定