2020年9月12日 04:31:34go评论96阅读模式

英文:

Is there a Kotlin or Java lib for listening for audio commands? (Want to trigger a Halloween display when kids yell 'TRICK OR TREAT')

问题

目标：

在低端设备（树莓派 3）上
监听固定的一组音频短语命令（类似于我的版本的“嘿 Google”或“嘿 Siri”）
可以是非常有限的词汇（少于 10 个命令）
当检测到命令时触发 Kotlin 函数。
在不使用大量 CPU 或大量网络带宽的情况下进行。

据我所知，现代边缘设备（Echo、智能手机、Google Home 等）具有非常先进的硬件和软件解决方案，使它们能够持续监听关键词，而不会占用大量 CPU，并且无需将所有音频发送到云服务器。我想要实现相同的功能，但我不确定是否可能 - 我相信他们训练了精简高效的“嘿 Siri”机器学习模型，以处理各种口音、音量、语调、年龄、背景噪音等。

Java 语音 API（JSAPI）似乎有些不确定。许多示例都很旧，要么指向不受支持的库，要么最终使用了 Google Cloud Speech。
这不一定必须是一个 Java/Kotlin 库，我也可以包装一个本地的命令监听进程。
我正在看 ML Kit 和 Firebase ML，但没有看到音频到命令的转换。
最好能够调整灵敏度，戴着化妆面具的小孩喊“TWIC R TREET”或“TMURMP... TWEEF”之类的声音也应该能触发它。
但不是一个纯音量检测器，经过的汽车不应该触发它。

有什么建议吗？或者对于树莓派来说，这样的要求是否不合理？

英文:

Goal:

On a low-end device (raspberry pi 3)
Listen for a fixed set of audio phrase commands (my version of 'Hey Google' or 'Hey Siri')
That can be a very constrained vocabulary (less than 10 commands)
Trigger a Kotlin function when the command is detected.
Without using a ton of CPU, or a ton of network bandwidth.

AFAIK modern edge devices (Echo, Smart Phones, Google Home, etc) have very fancy hardware+software solutions that allow them to continuously listen for keywords without sucking up a ton of CPU, and without having to send all audio up to a cloud server. I'd like to have the same, but am not sure if it is even possible - I'm sure they trained their minimal and efficient 'Hey Siri' ML model to handle all sorts of accents, volumes, cadence, ages, background noise, etc.

The Java Speech API (JSAPI) seems... iffy. Many of the examples are old, and either point to unsupported libs, or ended up using Google Cloud Speech.
This doesn't have to be a Java/Kotlin library, I could also wrap a native command listener process.
I'm looking at ML Kit and Firebase ML, but didn't see audio to command conversion.
It would be best if I could tune the sensitivity, small children wearing costume masks yell 'TWIC R TREET' or 'TMURMP... TWEEF' or whatever should still do it.
... But not a pure volume detector, a car driving by shouldn't trigger it.

Any suggestions? Or is this unreasonable to ask of an rpi?

答案1

得分: 2

你可以使用类似CMU Sphinx的库，该库可以离线工作，无需其他在线服务器。
有时识别的结果相当不准确。为了解决这个问题，我使用了比默认提供的字典小得多的字典。虽然我没有在树莓派上进行全面测试，但我认为它应该能够工作。

英文:

You could use a library like CMU Sphinx which works offline, not requiring other online servers.
Sometimes the recognized results are quite inaccurate. To solve that problem I used a much smaller dictionary than the default provided one. I never fully tested it on a Raspberry Pi but I think it should work.

答案2

得分: 1

是的，有一个非常有用的库，我推荐使用：https://cmusphinx.github.io/

英文:

Yes there is a quite useful library that I recommend: https://cmusphinx.github.io/

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Is there a Kotlin or Java lib for listening for audio commands? (Want to trigger a Halloween display when kids yell 'TRICK OR TREAT')

问题

答案1

答案2

Spring Boot war packaging: 打包时加入额外的类路径

分割字符串时，多个空格上分割，不要在单个空格上分割？

Java中带有变量名的特定格式的命令行参数。

无法在Eclipse中导入Maven依赖？尝试了一切。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。