2023年1月9日 18:41:02go评论89阅读模式

英文:

Not getting the full audio into text

问题

我尝试了以下代码：

import speech_recognition as sr

r = sr.Recognizer()
filename = "demo.wav"

with sr.AudioFile(filename) as source:
    audio_data = r.record(source)
    text = r.recognize_google(audio_data)
    print(text)

来自这里

它的输出如下：

result2:
{ 'alternative': [ { 'confidence': 0.92995489,
 'transcript': 'talking nonsense'},
 { 'transcript': 'you talking nonsense'},
 {'transcript': 'are you talking nonsense'},
 {'transcript': 'Divya talking nonsense'},
 {'transcript': 'are talking nonsense'}],
 'final': True}
talking nonsense

但是音频文件包含：

"I believe you're just talking nonsense"

为什么它没有给出整个音频？？
请帮我弄清楚。。

谢谢

英文:

I have tried the below code:

import speech_recognition as sr

r = sr.Recognizer()
filename = &quot;demo.wav&quot;

with sr.AudioFile(filename) as source:
    audio_data = r.record(source)
    text = r.recognize_google(audio_data)
    print(text)

from here

It gives the output as follows

result2:
{   &#39;alternative&#39;: [   {   &#39;confidence&#39;: 0.92995489,
                           &#39;transcript&#39;: &#39;talking nonsense&#39;},
                       {&#39;transcript&#39;: &#39;you talking nonsense&#39;},
                       {&#39;transcript&#39;: &#39;are you talking nonsense&#39;},
                       {&#39;transcript&#39;: &#39;Divya talking nonsense&#39;},
                       {&#39;transcript&#39;: &#39;are talking nonsense&#39;}],
    &#39;final&#39;: True}
talking nonsense

But the audio file contains :
"I believe you're just talking nonsense"

Why it is not giving the whole audio??
Please help me to figure it out..

Thankuu

答案1

得分: 1

以下是您要翻译的内容：

"The function recognize_google performs speech recognition using the Google Speech Recognition API."

"Speech recognition can obviously never be 100% exact with the input."

As stated in the documentation of the function recognize_google:

Returns the most likely transcription if "show_all" is false (the default). Otherwise, returns the raw API response as a JSON dictionary.

Raises a speech_recognition.UnknownValueError exception if the speech is unintelligible. Raises a speech_recognition.RequestError exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.

The first lines ("result2:" and the model) that you see in your code are the output (not the result) of the function recognize_google (See source line #918).

The last line ("talking nonsense") is the actual result of function recognize_google which is based on confidence values of the different hypotheses (See source lines #921ff)

If you want to get the full result add the argument show_all=True to recognize_google. See example below.

The following example shows how to test it without having to record a wavefile. The wavefile is generated by espeak (present in most linux distros).

import speech_recognition as sr
import subprocess
import pprint

wave_file = '/path/to/your/wavefile.wav'
text = "I believe you are just talking nonsense"
proc = subprocess.Popen(['espeak', '-a', '200', '-s', '130', '-w', wave_file, text])
proc.communicate()

recognizer = sr.Recognizer()

with sr.AudioFile(wave_file) as source:
    audio_data = recognizer.record(source)

if audio_data is not None:
    recognized_text = recognizer.recognize_google(audio_data, show_all=True)
    pprint.pprint(recognized_text)

{'alternative': [{'confidence': 0.88625956,
                  'transcript': "I'm talking nonsense"},
                 {'transcript': 'talking nonsense'},
                 {'transcript': "I'm talking London"},
                 {'transcript': 'talking London'},
                 {'transcript': "I'm talking now"}],
 'final': True}

英文:

The function recognize_google "performs speech recognition using the Google Speech Recognition API."

Speech recognition can obvioulsy never be 100% exact with the input.

As stated in the documentation of the function recognize_gogle:

> Returns the most likely transcription if show_all is false (the default). Otherwise, returns the raw API response as a JSON dictionary.
>
> Raises a speech_recognition.UnknownValueError exception if the speech is unintelligible. Raises a speech_recognition.RequestError exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.

The first lines ("result2:" and the model) that you see in your code are the output (not the result) of the function recognize_gogle (See source line #918).

The last line ("talking nonsense") is the actual result of function recognize_gogle which is based in confidence values of the different hypothesis (See source lines #921ff)

If you want to get the full result add the argument show_all=True to recognize_gogle. See example below.

The following example shows how to test it without having to record a wavefile. The wavefile is generated by espeak (present in most linux distros).

import speech_recognition as sr
import subprocess
import pprint

wave_file = &#39;/path/to/your/wavefile.wav&#39;
text = &quot;I believe you are just talking nonsense&quot;
proc = subprocess.Popen([&#39;espeak&#39;, &#39;-a&#39;, &#39;200&#39;, &#39;-s&#39;, &#39;130&#39;, &#39;-w&#39;, wave_file, text])
proc.communicate()

recognizer = sr.Recognizer()

with sr.AudioFile(wave_file) as source:
    audio_data = recognizer.record(source)

if audio_data is not None:
    recognized_text = recognizer.recognize_google(audio_data, show_all=True)
    pprint.pprint(recognized_text)

{&#39;alternative&#39;: [{&#39;confidence&#39;: 0.88625956,
                  &#39;transcript&#39;: &quot;I&#39;m talking nonsense&quot;},
                 {&#39;transcript&#39;: &#39;talking nonsense&#39;},
                 {&#39;transcript&#39;: &quot;I&#39;m talking London&quot;},
                 {&#39;transcript&#39;: &#39;talking London&#39;},
                 {&#39;transcript&#39;: &quot;I&#39;m talking now&quot;}],
 &#39;final&#39;: True}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

未能将完整的音频转换为文本。

问题

答案1

pandas – 在多列中筛选具有相同值的行

tkinter窗口不会弹出

如何使用enumerate()来更改嵌套列表中的值

相反的是互斥的 – 两个论点必须共存。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论