如何在Python中以JSON格式格式化OpenAI转录并包含时间戳?

huangapple go评论74阅读模式
英文:

How to format OpenAI transcriptions in JSON format with timestamps in Python?

问题

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-html -->

import openai


openai.organization = "org-xxxxxx"
openai.api_key = "sk-xxxxx"

audio_file_path =  "/Users/tejaksha/Downloads/dhoni.mp4"

# 注意:要使下面的代码工作,您需要使用 OpenAI Python v0.27.0

audio_file= open(audio_file_path, "rb")
transcript = openai.Audio.transcribe("whisper-1", audio_file)

<!-- end snippet -->

在上面的代码中,我能够获得以下输出

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-html -->

{
    "text": "Flat back, just got a little tight to him, he was wagging for it, set up for the slower ball and punished it. The one's going straight down the ground. And MS Daini just taking control."
}

<!-- end snippet -->

但我想要的格式如下,带有时间戳,如何使用 OPENAI 转录获得?

我需要的实际格式是

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

{
  "transcript": [
    {
      "text": "[Music]",
      "start": 7.39,
      "duration": 4.1
    },
    {
      "text": "once upon a time",
      "start": 16.48,
      "duration": 4.4
    },
    {
      "text": "in ancient china there lived three",
      "start": 17.6,
      "duration": 6.64
    },
    {
      "text": "old monks their names are not remembered",
      "start": 20.88,
      "duration": 6.559
    }
  ]
}

<!-- end snippet -->
英文:

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-html -->

import openai


openai.organization = &quot;org-xxxxxx&quot;
openai.api_key = &quot;sk-xxxxx&quot;

audio_file_path =  &quot;/Users/tejaksha/Downloads/dhoni.mp4&quot;

# Note: you need to be using OpenAI Python v0.27.0 for the code below to work

audio_file= open(audio_file_path, &quot;rb&quot;)
transcript = openai.Audio.transcribe(&quot;whisper-1&quot;, audio_file)

<!-- end snippet -->

In the above code i as able to get the output

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-html -->

{
    &quot;text&quot;: &quot;Flat back, just got a little tight to him, he was wagging for it, set up for the slower ball and punished it. The one&#39;s going straight down the ground. And MS Daini just taking control.&quot;
}

<!-- end snippet -->

But i want as the following format with timestamp how to get using OPENAI transcription?

Acutual format that is required for me is

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

{
  &quot;transcript&quot;: [
    {
      &quot;text&quot;: &quot;[Music]&quot;,
      &quot;start&quot;: 7.39,
      &quot;duration&quot;: 4.1
    },
    {
      &quot;text&quot;: &quot;once upon a time&quot;,
      &quot;start&quot;: 16.48,
      &quot;duration&quot;: 4.4
    },
    {
      &quot;text&quot;: &quot;in ancient china there lived three&quot;,
      &quot;start&quot;: 17.6,
      &quot;duration&quot;: 6.64
    },
    {
      &quot;text&quot;: &quot;old monks their names are not remembered&quot;,
      &quot;start&quot;: 20.88,
      &quot;duration&quot;: 6.559
    }
  ]
}

<!-- end snippet -->

答案1

得分: 1

我相信 OpenAI API 不支持这样的功能。然而,你可以使用 whisper 库并返回时间戳。

import whisper
model = whisper.load_model("base")
audio = whisper.load_audio(ASRPage.output_file_path)
result = model.transcribe(audio)
print(result["segments"])

这意味着你需要拥有自己的 GPU 或个人电脑来运行推断。

英文:

I believe that the OpenAI API does not support such feature. However, you can use the whisper library and return the timestamps.

import whisper
model = whisper.load_model(&quot;base&quot;)
audio = whisper.load_audio(ASRPage.output_file_path)
result = model.transcribe(audio)
print(result[&quot;segments&quot;])

This does mean that you need to your own GPU or pc to run the inference.

huangapple
  • 本文由 发表于 2023年5月24日 20:57:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76323803.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定