英文:
Google Cloud Speech-to-Text returns empty transcription for OGG OPUS Base64 audio
问题
I am trying to transcribe an OGG OPUS Base64 encoded audio string using Google Cloud Speech-to-Text API in Node.js. The audio has a sample rate of 48000 hertz. When I run my code, the API returns an empty transcription. This only happens sometimes. Other times, it transcribes the audio just fine. I will return to the project later and find that the error returns randomly. When I convert the Base64 to a Buffer and save the file, the audio plays fine in VLC player, and ffprobe shows the correct information for the resulting file.
我正在尝试使用 Google Cloud Speech-to-Text API 在 Node.js 中转录一个经过Base64编码的OGG OPUS音频字符串。音频的采样率为48000赫兹。当我运行我的代码时,API返回一个空的转录结果。这只是偶尔发生。其他时候,它可以正确地转录音频。我稍后返回项目并发现错误会随机出现。当我将Base64转换为Buffer并保存文件时,在VLC播放器中可以正常播放音频,并且ffprobe显示生成文件的正确信息。
I have already tried checking the audio quality, encoding, sample rate, etc., but none of these solutions helped. Here's my code:
我已经尝试过检查音频质量、编码、采样率等,但这些解决方案都没有帮助。这是我的代码:
import { SpeechClient } from "@google-cloud/speech";
// `base64Audio` looks like this:
// "data:audio/ogg; codecs=opus;base64,T2dnUwACAAAAAAAA..."
export async function transcribeB64(base64Audio: string): Promise<string> {
const client = new SpeechClient();
return new Promise(async (resolve) => {
const content = base64Audio.split(",")[1];
const x = await client.recognize({
config: {
encoding: "OGG_OPUS",
sampleRateHertz: 48000,
languageCode: "en-US",
},
audio: {
content,
},
});
resolve(JSON.stringify(x, null, 2));
});
}
The API response looks like this:
API响应如下:
[
{
"results": [],
"totalBilledTime": {
"seconds": "0",
"nanos": 0
},
"speechAdaptationInfo": null,
"requestId": "000000"
},
null,
null
]
And this is the ffprobe output:
这是ffprobe的输出:
Input #0, ogg, from 'input.ogg':
Duration: 00:00:05.74, start: 0.000000, bitrate: 129 kb/s
Stream #0:0: Audio: opus, 48000 Hz, mono, fltp
Metadata:
ENCODER : Mozilla111.0.1
Why is my audio not being transcribed?
为什么我的音频没有被转录?
英文:
I am trying to transcribe an OGG OPUS Base64 encoded audio string using Google Cloud Speech-to-Text API in Node.js. The audio has a sample rate of 48000 hertz. When I run my code, the API returns an empty transcription. This only happens sometimes. Other times, it transcribes the audio just fine. I will return to the project later and find that the error returns randomly. When I convert the Base64 to a Buffer and save the file, the audio plays fine in VLC player, and ffprobe shows the correct information for the resulting file.
I have already tried checking the audio quality, encoding, sample rate, etc., but none of these solutions helped. Here's my code:
import { SpeechClient } from "@google-cloud/speech";
// `base64Audio` looks like this:
// "data:audio/ogg; codecs=opus;base64,T2dnUwACAAAAAAAA..."
export async function transcribeB64(base64Audio: string): Promise<string> {
const client = new SpeechClient();
return new Promise(async (resolve) => {
const content = base64Audio.split(",")[1];
const x = await client.recognize({
config: {
encoding: "OGG_OPUS",
sampleRateHertz: 48000,
languageCode: "en-US",
},
audio: {
content,
},
});
resolve(JSON.stringify(x, null, 2));
});
}
The API response looks like this:
[
{
"results": [],
"totalBilledTime": {
"seconds": "0",
"nanos": 0
},
"speechAdaptationInfo": null,
"requestId": "000000"
},
null,
null
]
And this is the ffprobe output:
Input #0, ogg, from 'input.ogg':
Duration: 00:00:05.74, start: 0.000000, bitrate: 129 kb/s
Stream #0:0: Audio: opus, 48000 Hz, mono, fltp
Metadata:
ENCODER : Mozilla111.0.1
Why is my audio not being transcribed?
答案1
得分: 1
更改编解码器从"OGG_OPUS"到"WEBM_OPUS"似乎解决了问题,但我无法确定根本原因。目前我没有任何可能发生这种情况的解释。
英文:
I was not able to isolate a root cause, but it appears that changing the codec from "OGG_OPUS" to "WEBM_OPUS" fixed the problem so far. I would love to hear possible explanations of why this is happening but I have none at the moment.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论