2023年4月17日 01:38:48go评论111阅读模式

英文:

Google Cloud Speech-to-Text returns empty transcription for OGG OPUS Base64 audio

问题

I am trying to transcribe an OGG OPUS Base64 encoded audio string using Google Cloud Speech-to-Text API in Node.js. The audio has a sample rate of 48000 hertz. When I run my code, the API returns an empty transcription. This only happens sometimes. Other times, it transcribes the audio just fine. I will return to the project later and find that the error returns randomly. When I convert the Base64 to a Buffer and save the file, the audio plays fine in VLC player, and ffprobe shows the correct information for the resulting file.

我正在尝试使用 Google Cloud Speech-to-Text API 在 Node.js 中转录一个经过Base64编码的OGG OPUS音频字符串。音频的采样率为48000赫兹。当我运行我的代码时，API返回一个空的转录结果。这只是偶尔发生。其他时候，它可以正确地转录音频。我稍后返回项目并发现错误会随机出现。当我将Base64转换为Buffer并保存文件时，在VLC播放器中可以正常播放音频，并且ffprobe显示生成文件的正确信息。

I have already tried checking the audio quality, encoding, sample rate, etc., but none of these solutions helped. Here's my code:

我已经尝试过检查音频质量、编码、采样率等，但这些解决方案都没有帮助。这是我的代码：

import { SpeechClient } from "@google-cloud/speech";

// `base64Audio` looks like this:
//   "data:audio/ogg; codecs=opus;base64,T2dnUwACAAAAAAAA..."
export async function transcribeB64(base64Audio: string): Promise<string> {
  const client = new SpeechClient();
  return new Promise(async (resolve) => {
    const content = base64Audio.split(",")[1];
    const x = await client.recognize({
      config: {
        encoding: "OGG_OPUS",
        sampleRateHertz: 48000,
        languageCode: "en-US",
      },
      audio: {
        content,
      },
    });
    resolve(JSON.stringify(x, null, 2));
  });
}

The API response looks like this:

API响应如下：

[
  {
    "results": [],
    "totalBilledTime": {
      "seconds": "0",
      "nanos": 0
    },
    "speechAdaptationInfo": null,
    "requestId": "000000"
  },
  null,
  null
]

And this is the ffprobe output:

这是ffprobe的输出：

Input #0, ogg, from 'input.ogg':
  Duration: 00:00:05.74, start: 0.000000, bitrate: 129 kb/s
  Stream #0:0: Audio: opus, 48000 Hz, mono, fltp
    Metadata:
      ENCODER         : Mozilla111.0.1

Why is my audio not being transcribed?

为什么我的音频没有被转录？

英文:

I have already tried checking the audio quality, encoding, sample rate, etc., but none of these solutions helped. Here's my code:

import { SpeechClient } from &quot;@google-cloud/speech&quot;;

// `base64Audio` looks like this:
//   &quot;data:audio/ogg; codecs=opus;base64,T2dnUwACAAAAAAAA...&quot;
export async function transcribeB64(base64Audio: string): Promise&lt;string&gt; {
  const client = new SpeechClient();
  return new Promise(async (resolve) =&gt; {
    const content = base64Audio.split(&quot;,&quot;)[1];
    const x = await client.recognize({
      config: {
        encoding: &quot;OGG_OPUS&quot;,
        sampleRateHertz: 48000,
        languageCode: &quot;en-US&quot;,
      },
      audio: {
        content,
      },
    });
    resolve(JSON.stringify(x, null, 2));
  });
}

The API response looks like this:

[
  {
    &quot;results&quot;: [],
    &quot;totalBilledTime&quot;: {
      &quot;seconds&quot;: &quot;0&quot;,
      &quot;nanos&quot;: 0
    },
    &quot;speechAdaptationInfo&quot;: null,
    &quot;requestId&quot;: &quot;000000&quot;
  },
  null,
  null
]

And this is the ffprobe output:

Input #0, ogg, from &#39;input.ogg&#39;:
  Duration: 00:00:05.74, start: 0.000000, bitrate: 129 kb/s
  Stream #0:0: Audio: opus, 48000 Hz, mono, fltp
    Metadata:
      ENCODER         : Mozilla111.0.1

Why is my audio not being transcribed?

答案1

得分: 1

更改编解码器从"OGG_OPUS"到"WEBM_OPUS"似乎解决了问题，但我无法确定根本原因。目前我没有任何可能发生这种情况的解释。

英文:

I was not able to isolate a root cause, but it appears that changing the codec from "OGG_OPUS" to "WEBM_OPUS" fixed the problem so far. I would love to hear possible explanations of why this is happening but I have none at the moment.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Google Cloud Speech-to-Text 返回空的转录结果，用于 OGG OPUS Base64 音频。

问题

答案1

Concentus: 解码和混音后音频失真

为什么语音 REST API 的响应与 go SDK API 的响应不同？

如何在golang中对音频流进行编码以进行Google语音识别？

MediaRecorder 将 Blob 转换为 MP4 以供 PPTXGenJS 视频使用。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论