错误:运行Whisper Open AI模型时加载音频时出错。

huangapple go评论167阅读模式
英文:

Error audio loading when runing Whisper Open AI model

问题

无法运行Whisper模型,错误消息指向音频解码问题。尝试使用'micro-machines.wav'文件正常运行,但使用其他音频文件时出错。

错误信息:

d:\...\venv\lib\site-packages\whisper\transcribe.py:79: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")                                                                                        
Traceback (most recent call last):
  File "d:\...\venv\lib\site-packages\whisper\audio.py", line 42, in load_audio
    ffmpeg.input(file, threads=0)                                                                                    
  File "d:\...\venv\lib\site-packages\ffmpeg\_run.py", line 325, in run        
    raise Error('ffmpeg', out, err)                                                                                  
ffmpeg._run.Error: ffmpeg error (see stderr output for detail)                                                       

上述异常是以下异常的直接原因:

Traceback (most recent call last):
  File "C:\....\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\.....\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\...\venv\Scripts\whisper.exe\__main__.py", line 7, in <module>
  File "d:\...\venv\lib\site-packages\whisper\transcribe.py", line 314, in cli
    result = transcribe(model, audio_path, temperature=temperature, **args)
  File "d:\...\venv\lib\site-packages\whisper\transcribe.py", line 85, in transcribe
    mel = log_mel_spectrogram(audio)
  File "d:\...\venv\lib\site-packages\whisper\audio.py", line 111, in log_mel_spectrogram
    audio = load_audio(audio)
  File "d:\...\venv\lib\site-packages\whisper\audio.py", line 47, in load_audio
    raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
RuntimeError: Failed to load audio: ffmpeg version 6.0-essentials_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enab
le-gmp --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxv
id --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf 
--enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libo
pencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enab
le-librubberband
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
payload.wav: 输入处理时发现无效数据
英文:

The problem im trying to solve is that i cant run Whisper model for some audio, it says something related to audio decoding. payload.wav: Invalid data found when processing input. raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e

I tried using the micro-machines.wav and it works fine but when i used other audio it gives me an error

import whisper

model = whisper.load_model(&quot;base&quot;)
text=model.transcribe(&#39;micro-machines.wav&#39;,fp16=False)
print(text)
text=model.transcribe(&#39;payload.wav&#39;,fp16=False)
print(text)

Error im getting for payload

d:\...\venv\lib\site-packages\whisper\transcribe.py:79: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn(&quot;FP16 is not supported on CPU; using FP32 instead&quot;)                                                                                        
Traceback (most recent call last):
  File &quot;d:\...\venv\lib\site-packages\whisper\audio.py&quot;, line 42, in load_audio
    ffmpeg.input(file, threads=0)                                                                                    
  File &quot;d:\...\venv\lib\site-packages\ffmpeg\_run.py&quot;, line 325, in run        
    raise Error(&#39;ffmpeg&#39;, out, err)                                                                                  
ffmpeg._run.Error: ffmpeg error (see stderr output for detail)                                                       

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File &quot;C:\....\Python\Python39\lib\runpy.py&quot;, line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File &quot;C:\.....\Python\Python39\lib\runpy.py&quot;, line 87, in _run_code
    exec(code, run_globals)
  File &quot;D:\...\venv\Scripts\whisper.exe\__main__.py&quot;, line 7, in &lt;module&gt;
  File &quot;d:\...\venv\lib\site-packages\whisper\transcribe.py&quot;, line 314, in cli
    result = transcribe(model, audio_path, temperature=temperature, **args)
  File &quot;d:\...\venv\lib\site-packages\whisper\transcribe.py&quot;, line 85, in transcribe
    mel = log_mel_spectrogram(audio)
  File &quot;d:\...\venv\lib\site-packages\whisper\audio.py&quot;, line 111, in log_mel_spectrogram
    audio = load_audio(audio)
  File &quot;d:\...\venv\lib\site-packages\whisper\audio.py&quot;, line 47, in load_audio
    raise RuntimeError(f&quot;Failed to load audio: {e.stderr.decode()}&quot;) from e
RuntimeError: Failed to load audio: ffmpeg version 6.0-essentials_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enab
le-gmp --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxv
id --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf 
--enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libo
pencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enab
le-librubberband
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
payload.wav: Invalid data found when processing input

i tried searching for solution and i found it says It appears that the code failed to load the audio file for some reason and even failed to display that error because e.stderr did not contain a valid UTF-8 string

if anyone can guide me it how i can solve this issue

Thank you

答案1

得分: 1

你必须确保音频文件路径是有效的。

import whisper

model = whisper.load_model("base")

audioPath = "audios/me.m4a"  # 你的音频文件路径必须是正确的。

result = model.transcribe(audioPath, fp16=False)
print(result["text"])

更多信息: https://github.com/openai/whisper/discussions/301

英文:

You must be sure that the audio file path is valid.

import whisper

model = whisper.load_model(&quot;base&quot;)

audioPath = &quot;audios/me.m4a&quot; # The path to your audio file must be correct.

result = model.transcribe(audioPath, fp16=False) 
print(result[&quot;text&quot;])

More info : https://github.com/openai/whisper/discussions/301

答案2

得分: 0

我遇到了相同的问题,原因似乎只是音频文件的命名不同。在代码中,我使用了 *.mp3,但我有一个 *.wav 文件记录的。

还要确保你在与文件相同的目录中运行你的Python代码,这被视为"根目录"。要做到这一点,只需在管理员PowerShell中cd到目录中。

英文:

I ran into the same problem and it appeared to be just having the audio file named differently. In code I used *.mp3 and I had *.wav file recorded.

Also make sure you are running your Python code from inside the same directory you have your file in, it is considered as "root" then. To do so, simply cd into the directory from administrator powershell.

huangapple
  • 本文由 发表于 2023年3月3日 17:49:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75625505.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定