英文:
Not getting the full audio into text
问题
我尝试了以下代码:
import speech_recognition as sr
r = sr.Recognizer()
filename = "demo.wav"
with sr.AudioFile(filename) as source:
audio_data = r.record(source)
text = r.recognize_google(audio_data)
print(text)
来自这里
它的输出如下:
result2:
{ 'alternative': [ { 'confidence': 0.92995489,
'transcript': 'talking nonsense'},
{ 'transcript': 'you talking nonsense'},
{'transcript': 'are you talking nonsense'},
{'transcript': 'Divya talking nonsense'},
{'transcript': 'are talking nonsense'}],
'final': True}
talking nonsense
但是音频文件包含:
"I believe you're just talking nonsense"
为什么它没有给出整个音频??
请帮我弄清楚。。
谢谢
英文:
I have tried the below code:
import speech_recognition as sr
r = sr.Recognizer()
filename = "demo.wav"
with sr.AudioFile(filename) as source:
audio_data = r.record(source)
text = r.recognize_google(audio_data)
print(text)
from here
It gives the output as follows
result2:
{ 'alternative': [ { 'confidence': 0.92995489,
'transcript': 'talking nonsense'},
{'transcript': 'you talking nonsense'},
{'transcript': 'are you talking nonsense'},
{'transcript': 'Divya talking nonsense'},
{'transcript': 'are talking nonsense'}],
'final': True}
talking nonsense
But the audio file contains :
"I believe you're just talking nonsense"
Why it is not giving the whole audio??
Please help me to figure it out..
Thankuu
答案1
得分: 1
以下是您要翻译的内容:
"The function recognize_google
performs speech recognition using the Google Speech Recognition API."
"Speech recognition can obviously never be 100% exact with the input."
As stated in the documentation of the function recognize_google:
Returns the most likely transcription if "show_all" is false (the default). Otherwise, returns the raw API response as a JSON dictionary.
Raises a
speech_recognition.UnknownValueError
exception if the speech is unintelligible. Raises aspeech_recognition.RequestError
exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.
The first lines ("result2:" and the model) that you see in your code are the output (not the result) of the function recognize_google
(See source line #918).
The last line ("talking nonsense") is the actual result of function recognize_google
which is based on confidence values of the different hypotheses (See source lines #921ff)
If you want to get the full result add the argument show_all=True
to recognize_google
. See example below.
The following example shows how to test it without having to record a wavefile. The wavefile is generated by espeak
(present in most linux distros).
import speech_recognition as sr
import subprocess
import pprint
wave_file = '/path/to/your/wavefile.wav'
text = "I believe you are just talking nonsense"
proc = subprocess.Popen(['espeak', '-a', '200', '-s', '130', '-w', wave_file, text])
proc.communicate()
recognizer = sr.Recognizer()
with sr.AudioFile(wave_file) as source:
audio_data = recognizer.record(source)
if audio_data is not None:
recognized_text = recognizer.recognize_google(audio_data, show_all=True)
pprint.pprint(recognized_text)
{'alternative': [{'confidence': 0.88625956,
'transcript': "I'm talking nonsense"},
{'transcript': 'talking nonsense'},
{'transcript': "I'm talking London"},
{'transcript': 'talking London'},
{'transcript': "I'm talking now"}],
'final': True}
英文:
The function recognize_google
"performs speech recognition using the Google Speech Recognition API."
Speech recognition can obvioulsy never be 100% exact with the input.
As stated in the documentation of the function recognize_gogle:
> Returns the most likely transcription if show_all
is false (the default). Otherwise, returns the raw API response as a JSON dictionary.
>
> Raises a speech_recognition.UnknownValueError
exception if the speech is unintelligible. Raises a speech_recognition.RequestError
exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.
The first lines ("result2:" and the model) that you see in your code are the output (not the result) of the function recognize_gogle
(See source line #918).
The last line ("talking nonsense") is the actual result of function recognize_gogle
which is based in confidence values of the different hypothesis (See source lines #921ff)
If you want to get the full result add the argument show_all=True
to recognize_gogle
. See example below.
The following example shows how to test it without having to record a wavefile. The wavefile is generated by espeak
(present in most linux distros).
import speech_recognition as sr
import subprocess
import pprint
wave_file = '/path/to/your/wavefile.wav'
text = "I believe you are just talking nonsense"
proc = subprocess.Popen(['espeak', '-a', '200', '-s', '130', '-w', wave_file, text])
proc.communicate()
recognizer = sr.Recognizer()
with sr.AudioFile(wave_file) as source:
audio_data = recognizer.record(source)
if audio_data is not None:
recognized_text = recognizer.recognize_google(audio_data, show_all=True)
pprint.pprint(recognized_text)
{'alternative': [{'confidence': 0.88625956,
'transcript': "I'm talking nonsense"},
{'transcript': 'talking nonsense'},
{'transcript': "I'm talking London"},
{'transcript': 'talking London'},
{'transcript': "I'm talking now"}],
'final': True}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论