未能将完整的音频转换为文本。

huangapple go评论84阅读模式
英文:

Not getting the full audio into text

问题

我尝试了以下代码:

import speech_recognition as sr

r = sr.Recognizer()
filename = "demo.wav"

with sr.AudioFile(filename) as source:
    audio_data = r.record(source)
    text = r.recognize_google(audio_data)
    print(text)

来自这里

它的输出如下:

result2:
{ 'alternative': [ { 'confidence': 0.92995489,
 'transcript': 'talking nonsense'},
 { 'transcript': 'you talking nonsense'},
 {'transcript': 'are you talking nonsense'},
 {'transcript': 'Divya talking nonsense'},
 {'transcript': 'are talking nonsense'}],
 'final': True}
talking nonsense

但是音频文件包含:

"I believe you're just talking nonsense"

为什么它没有给出整个音频??
请帮我弄清楚。。

谢谢

英文:

I have tried the below code:

import speech_recognition as sr

r = sr.Recognizer()
filename = "demo.wav"

with sr.AudioFile(filename) as source:
    audio_data = r.record(source)
    text = r.recognize_google(audio_data)
    print(text)

from here

It gives the output as follows

result2:
{   'alternative': [   {   'confidence': 0.92995489,
                           'transcript': 'talking nonsense'},
                       {'transcript': 'you talking nonsense'},
                       {'transcript': 'are you talking nonsense'},
                       {'transcript': 'Divya talking nonsense'},
                       {'transcript': 'are talking nonsense'}],
    'final': True}
talking nonsense

But the audio file contains :
"I believe you're just talking nonsense"

Why it is not giving the whole audio??
Please help me to figure it out..

Thankuu

答案1

得分: 1

以下是您要翻译的内容:

"The function recognize_google performs speech recognition using the Google Speech Recognition API."

"Speech recognition can obviously never be 100% exact with the input."

As stated in the documentation of the function recognize_google:

Returns the most likely transcription if "show_all" is false (the default). Otherwise, returns the raw API response as a JSON dictionary.

Raises a speech_recognition.UnknownValueError exception if the speech is unintelligible. Raises a speech_recognition.RequestError exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.

The first lines ("result2:" and the model) that you see in your code are the output (not the result) of the function recognize_google (See source line #918).

The last line ("talking nonsense") is the actual result of function recognize_google which is based on confidence values of the different hypotheses (See source lines #921ff)

If you want to get the full result add the argument show_all=True to recognize_google. See example below.

The following example shows how to test it without having to record a wavefile. The wavefile is generated by espeak (present in most linux distros).

import speech_recognition as sr
import subprocess
import pprint

wave_file = '/path/to/your/wavefile.wav'
text = "I believe you are just talking nonsense"
proc = subprocess.Popen(['espeak', '-a', '200', '-s', '130', '-w', wave_file, text])
proc.communicate()

recognizer = sr.Recognizer()

with sr.AudioFile(wave_file) as source:
    audio_data = recognizer.record(source)

if audio_data is not None:
    recognized_text = recognizer.recognize_google(audio_data, show_all=True)
    pprint.pprint(recognized_text)
{'alternative': [{'confidence': 0.88625956,
                  'transcript': "I'm talking nonsense"},
                 {'transcript': 'talking nonsense'},
                 {'transcript': "I'm talking London"},
                 {'transcript': 'talking London'},
                 {'transcript': "I'm talking now"}],
 'final': True}
英文:

The function recognize_google "performs speech recognition using the Google Speech Recognition API."

Speech recognition can obvioulsy never be 100% exact with the input.

As stated in the documentation of the function recognize_gogle:

> Returns the most likely transcription if show_all is false (the default). Otherwise, returns the raw API response as a JSON dictionary.
>
> Raises a speech_recognition.UnknownValueError exception if the speech is unintelligible. Raises a speech_recognition.RequestError exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.

The first lines ("result2:" and the model) that you see in your code are the output (not the result) of the function recognize_gogle (See source line #918).

The last line ("talking nonsense") is the actual result of function recognize_gogle which is based in confidence values of the different hypothesis (See source lines #921ff)

If you want to get the full result add the argument show_all=True to recognize_gogle. See example below.

The following example shows how to test it without having to record a wavefile. The wavefile is generated by espeak (present in most linux distros).

import speech_recognition as sr
import subprocess
import pprint

wave_file = '/path/to/your/wavefile.wav'
text = "I believe you are just talking nonsense"
proc = subprocess.Popen(['espeak', '-a', '200', '-s', '130', '-w', wave_file, text])
proc.communicate()

recognizer = sr.Recognizer()

with sr.AudioFile(wave_file) as source:
    audio_data = recognizer.record(source)

if audio_data is not None:
    recognized_text = recognizer.recognize_google(audio_data, show_all=True)
    pprint.pprint(recognized_text)
{'alternative': [{'confidence': 0.88625956,
                  'transcript': "I'm talking nonsense"},
                 {'transcript': 'talking nonsense'},
                 {'transcript': "I'm talking London"},
                 {'transcript': 'talking London'},
                 {'transcript': "I'm talking now"}],
 'final': True}

huangapple
  • 本文由 发表于 2023年1月9日 18:41:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/75056073.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定