Google Cloud Speech-to-Text 返回空的转录结果,用于 OGG OPUS Base64 音频。

huangapple go评论111阅读模式
英文:

Google Cloud Speech-to-Text returns empty transcription for OGG OPUS Base64 audio

问题

I am trying to transcribe an OGG OPUS Base64 encoded audio string using Google Cloud Speech-to-Text API in Node.js. The audio has a sample rate of 48000 hertz. When I run my code, the API returns an empty transcription. This only happens sometimes. Other times, it transcribes the audio just fine. I will return to the project later and find that the error returns randomly. When I convert the Base64 to a Buffer and save the file, the audio plays fine in VLC player, and ffprobe shows the correct information for the resulting file.

我正在尝试使用 Google Cloud Speech-to-Text API 在 Node.js 中转录一个经过Base64编码的OGG OPUS音频字符串。音频的采样率为48000赫兹。当我运行我的代码时,API返回一个空的转录结果。这只是偶尔发生。其他时候,它可以正确地转录音频。我稍后返回项目并发现错误会随机出现。当我将Base64转换为Buffer并保存文件时,在VLC播放器中可以正常播放音频,并且ffprobe显示生成文件的正确信息。

I have already tried checking the audio quality, encoding, sample rate, etc., but none of these solutions helped. Here's my code:

我已经尝试过检查音频质量、编码、采样率等,但这些解决方案都没有帮助。这是我的代码:

import { SpeechClient } from "@google-cloud/speech";

// `base64Audio` looks like this:
//   "data:audio/ogg; codecs=opus;base64,T2dnUwACAAAAAAAA..."
export async function transcribeB64(base64Audio: string): Promise<string> {
  const client = new SpeechClient();
  return new Promise(async (resolve) => {
    const content = base64Audio.split(",")[1];
    const x = await client.recognize({
      config: {
        encoding: "OGG_OPUS",
        sampleRateHertz: 48000,
        languageCode: "en-US",
      },
      audio: {
        content,
      },
    });
    resolve(JSON.stringify(x, null, 2));
  });
}

The API response looks like this:

API响应如下:

[
  {
    "results": [],
    "totalBilledTime": {
      "seconds": "0",
      "nanos": 0
    },
    "speechAdaptationInfo": null,
    "requestId": "000000"
  },
  null,
  null
]

And this is the ffprobe output:

这是ffprobe的输出:

Input #0, ogg, from 'input.ogg':
  Duration: 00:00:05.74, start: 0.000000, bitrate: 129 kb/s
  Stream #0:0: Audio: opus, 48000 Hz, mono, fltp
    Metadata:
      ENCODER         : Mozilla111.0.1

Why is my audio not being transcribed?

为什么我的音频没有被转录?

英文:

I am trying to transcribe an OGG OPUS Base64 encoded audio string using Google Cloud Speech-to-Text API in Node.js. The audio has a sample rate of 48000 hertz. When I run my code, the API returns an empty transcription. This only happens sometimes. Other times, it transcribes the audio just fine. I will return to the project later and find that the error returns randomly. When I convert the Base64 to a Buffer and save the file, the audio plays fine in VLC player, and ffprobe shows the correct information for the resulting file.

I have already tried checking the audio quality, encoding, sample rate, etc., but none of these solutions helped. Here's my code:

import { SpeechClient } from &quot;@google-cloud/speech&quot;;

// `base64Audio` looks like this:
//   &quot;data:audio/ogg; codecs=opus;base64,T2dnUwACAAAAAAAA...&quot;
export async function transcribeB64(base64Audio: string): Promise&lt;string&gt; {
  const client = new SpeechClient();
  return new Promise(async (resolve) =&gt; {
    const content = base64Audio.split(&quot;,&quot;)[1];
    const x = await client.recognize({
      config: {
        encoding: &quot;OGG_OPUS&quot;,
        sampleRateHertz: 48000,
        languageCode: &quot;en-US&quot;,
      },
      audio: {
        content,
      },
    });
    resolve(JSON.stringify(x, null, 2));
  });
}

The API response looks like this:

[
  {
    &quot;results&quot;: [],
    &quot;totalBilledTime&quot;: {
      &quot;seconds&quot;: &quot;0&quot;,
      &quot;nanos&quot;: 0
    },
    &quot;speechAdaptationInfo&quot;: null,
    &quot;requestId&quot;: &quot;000000&quot;
  },
  null,
  null
]

And this is the ffprobe output:

Input #0, ogg, from &#39;input.ogg&#39;:
  Duration: 00:00:05.74, start: 0.000000, bitrate: 129 kb/s
  Stream #0:0: Audio: opus, 48000 Hz, mono, fltp
    Metadata:
      ENCODER         : Mozilla111.0.1

Why is my audio not being transcribed?

答案1

得分: 1

更改编解码器从"OGG_OPUS"到"WEBM_OPUS"似乎解决了问题,但我无法确定根本原因。目前我没有任何可能发生这种情况的解释。

英文:

I was not able to isolate a root cause, but it appears that changing the codec from "OGG_OPUS" to "WEBM_OPUS" fixed the problem so far. I would love to hear possible explanations of why this is happening but I have none at the moment.

huangapple
  • 本文由 发表于 2023年4月17日 01:38:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76029362.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定