如何从FastAPI将一个m4a文件发送到OpenAI进行转录。

huangapple go评论85阅读模式
英文:

How to send an m4a file from FastAPI to OpenAI transcribe

问题

I'm trying to get an m4a file transcribed. I'm receiving this file at a FastAPI endpoint and then attempting to send it to OpenAI's transcribe but it seems like the format/shape is off. How can I turn the UploadFile into something that OpenAI will accept? The OpenAI docs for transcribe are essentially:

Here's my current code:

@app.post("/transcribe")
async def transcribe_audio_file(file: UploadFile = File(...)):
    contents = await file.read()
    contents_str = contents.decode()
    buffer = io.StringIO(contents_str)

    transcript_response = openai.Audio.transcribe("whisper-1", buffer)

I've modified the above code to several different scenarios, which return the respective errors:

    transcript_response = openai.Audio.transcribe("whisper-1", file) # AttributeError: 'UploadFile' object has no attribute 'name'
    transcript_response = openai.Audio.transcribe("whisper-1", contents) # AttributeError: 'bytes' object has no attribute 'name'
    transcript_response = openai.Audio.transcribe("whisper-1", contents_str) # UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 13: invalid start byte
    transcript_response = openai.Audio.transcribe("whisper-1", buffer) # UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 13: invalid start byte

I have something similar working in a vanilla CLI python script that looks like this:

audio_file = open("./audio-file.m4a", "rb")
transcript_response = openai.Audio.transcribe("whisper-1", audio_file)

So I also tried using a method like that:

    with open(file.filename, "rb") as audio_file:
        transcript = openai.Audio.transcribe("whisper-1", audio_file)

But that gave the error:

FileNotFoundError: [Errno 2] No such file or directory: '6ad52ad0-2fce-4d79-b4ac-e154379ceacd'

Any tips on how to debug this myself are also welcome. I'm coming from TypeScript land.

英文:

I'm trying to get an m4a file transcribed. I'm receiving this file at a FastAPI endpoint and then attempting to send it to OpenAI's transcribe but it seems like the format/shape is off. How can I turn the UploadFile into something that OpenAI will accept? The OpenAI docs for transcribe are essentially:

> The transcriptions API takes as input the audio file you want to transcribe and the desired output file format for the transcription of the audio. We currently support multiple input and output file formats.

Here's my current code:

@app.post("/transcribe")
async def transcribe_audio_file(file: UploadFile = File(...)):
    contents = await file.read()
    contents_str = contents.decode()
    buffer = io.StringIO(contents_str)

    transcript_response = openai.Audio.transcribe("whisper-1", buffer)

I've modified the above code to several different scenarios, which return the respective errors:

    transcript_response = openai.Audio.transcribe("whisper-1", file) # AttributeError: 'UploadFile' object has no attribute 'name'
    transcript_response = openai.Audio.transcribe("whisper-1", contents) # AttributeError: 'bytes' object has no attribute 'name'
    transcript_response = openai.Audio.transcribe("whisper-1", contents_str) # UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 13: invalid start byte
    transcript_response = openai.Audio.transcribe("whisper-1", buffer) # UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 13: invalid start byte

I have something similar working in a vanilla CLI python script that looks like this:

audio_file = open("./audio-file.m4a", "rb")
transcript_response = openai.Audio.transcribe("whisper-1", audio_file)

So I also tried using a method like that:

    with open(file.filename, "rb") as audio_file:
        transcript = openai.Audio.transcribe("whisper-1", audio_file)

But that gave the error:

FileNotFoundError: [Errno 2] No such file or directory: '6ad52ad0-2fce-4d79-b4ac-e154379ceacd'

Any tips on how to debug this myself are also welcome. I'm coming from TypeScript land.

答案1

得分: 1

如在此问题中所提到,解决方案如下:

@app.post("/transcribe")
async def transcribe_audio_file(file: UploadFile = File(...)):
    audio = await file.read()
    buffer = io.BytesIO(audio)
    buffer.name = 'audio.m4a' # 几乎任何字符串都可以在这里使用
    transcript_response = openai.Audio.transcribe("whisper-1", buffer)
    return transcript_response
英文:

As mentioned in this question, the solution is as follows:

@app.post("/transcribe")
async def transcribe_audio_file(file: UploadFile = File(...)):
    audio = await file.read()
    buffer = io.BytesIO(audio)
    buffer.name = 'audio.m4a' # pretty sure any string here will do
    transcript_response = openai.Audio.transcribe("whisper-1", buffer)
    return transcript_response

huangapple
  • 本文由 发表于 2023年6月29日 03:07:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76576054.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定