亚马逊转录流式 API 无需 SDK

huangapple go评论80阅读模式
英文:

Amazon Transcribe Streaming API without SDK

问题

我正在尝试使用Go 1.11使用亚马逊的新流式转录API。目前亚马逊只提供了Java SDK,所以我正在尝试使用低级别的方法。

唯一相关的文档片段在这里,但它没有显示端点。我在一个Java示例中找到了它,它是https://transcribestreaming.<region>.amazonaws.com,我正在尝试使用爱尔兰地区,即https://transcribestreaming.eu-west-1.amazonaws.com。这是我打开HTTP/2双向流的代码:

import (
	"crypto/tls"
	"github.com/aws/aws-sdk-go-v2/aws"
	"github.com/aws/aws-sdk-go-v2/aws/external"
	"github.com/aws/aws-sdk-go-v2/aws/signer/v4"
	"golang.org/x/net/http2"
	"io"
	"io/ioutil"
	"log"
	"net/http"
	"os"
	"time"
)

const (
	HeaderKeyLanguageCode   = "x-amzn-transcribe-language-code"  // en-US
	HeaderKeyMediaEncoding  = "x-amzn-transcribe-media-encoding" // pcm only
	HeaderKeySampleRate     = "x-amzn-transcribe-sample-rate"    // 8000, 16000 ... 48000
	HeaderKeySessionId      = "x-amzn-transcribe-session-id"     // For retrying a session. Pattern: [a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}
	HeaderKeyVocabularyName = "x-amzn-transcribe-vocabulary-name"
	HeaderKeyRequestId = "x-amzn-request-id"
)

...

region := "eu-west-1"

cfg, err := external.LoadDefaultAWSConfig(aws.Config{
	Region: region,
})
if err != nil {
	log.Printf("could not load default AWS config: %v", err)
	return
}

signer := v4.NewSigner(cfg.Credentials)

transport := &http2.Transport{
	TLSClientConfig: &tls.Config{
		// allow insecure just for debugging
		InsecureSkipVerify: true,
	},
}
client := &http.Client{
	Transport: transport,
}

signTime := time.Now()

header := http.Header{}
header.Set(HeaderKeyLanguageCode, "en-US")
header.Set(HeaderKeyMediaEncoding, "pcm")
header.Set(HeaderKeySampleRate, "16000")
header.Set("Content-type", "application/json")

// Bi-directional streaming via a pipe.
pr, pw := io.Pipe()

req, err := http.NewRequest(http.MethodPost, "https://transcribestreaming.eu-west-1.amazonaws.com/stream-transcription", ioutil.NopCloser(pr))
if err != nil {
	log.Printf("err: %+v", err)
	return
}
req.Header = header

_, err = signer.Sign(req, nil, "transcribe", region, signTime)
if err != nil {
	log.Printf("problem signing headers: %+v", err)
	return
}

// This freezes and ends after 5 minutes with "unexpected EOF".
res, err := client.Do(req)
...

问题是执行请求(client.Do(req))会冻结五分钟,然后以“unexpected EOF”错误结束。

有没有想法我做错了什么?有人成功使用新的流式转录API而不使用Java SDK吗?

编辑(2019年3月11日):

我再次测试了一下,现在它不会超时,而是立即返回200 OK响应。但是响应正文中有一个“异常”:{"Output":{"__type":"com.amazon.coral.service#SerializationException"},"Version":"1.0"}

我尝试使用io.Pipe打开HTTP2流(就像上面的代码一样),还尝试了文档中描述的带有JSON正文的方法:

{
    "AudioStream": { 
        "AudioEvent": { 
            "AudioChunk": ""
        }
    }
}

结果是一样的。

编辑(2019年3月13日):

如@gpeng所提到的,从头中删除content-type将修复SerializationException。但是然后会出现IAM异常,需要将transcription:StartStreamTranscription权限添加到您的IAM用户中。这在AWS IAM控制台中无处可见,必须手动添加为自定义JSON权限:/

还有一个新的/另一个文档在这里中显示了不正确的host和新的content-type(不要使用该content-type,请求将返回404)。

删除content-type并添加新权限后,现在我收到一个异常{"Message":"A complete signal was sent without the preceding empty frame."}。同时,写入管道会永远阻塞,所以我又卡住了。新文档中描述的消息与旧文档中的消息不同,现在最终是二进制的,但我不理解它们。有没有办法在Go中发送这样的HTTP2消息?

编辑(2019年3月15日):

如果您收到关于签名不匹配的HTTP 403错误,请不要设置transfer-encodingx-amz-content-sha256 HTTP头。当我设置它们,使用AWS SDK的V4签名者对请求进行签名时,我收到HTTP 403 The request signature we calculated does not match the signature you provided.的错误信息。

英文:

I am trying to use Amazon's new streaming transcribe API from Go 1.11. Currently Amazon provides Java SDK only so I am trying the low-level way.

The only relevant piece of documentation is here but it does not show the endpoint. I have found it in a Java example that it is https://transcribestreaming.&lt;region&gt;.amazonaws.com and I am trying the Ireland region i.e. https://transcribestreaming.eu-west-1.amazonaws.com. Here is my code to open an HTTP/2 bi-directional stream:

import (
&quot;crypto/tls&quot;
&quot;github.com/aws/aws-sdk-go-v2/aws&quot;
&quot;github.com/aws/aws-sdk-go-v2/aws/external&quot;
&quot;github.com/aws/aws-sdk-go-v2/aws/signer/v4&quot;
&quot;golang.org/x/net/http2&quot;
&quot;io&quot;
&quot;io/ioutil&quot;
&quot;log&quot;
&quot;net/http&quot;
&quot;os&quot;
&quot;time&quot;
)
const (
HeaderKeyLanguageCode   = &quot;x-amzn-transcribe-language-code&quot;  // en-US
HeaderKeyMediaEncoding  = &quot;x-amzn-transcribe-media-encoding&quot; // pcm only
HeaderKeySampleRate     = &quot;x-amzn-transcribe-sample-rate&quot;    // 8000, 16000 ... 48000
HeaderKeySessionId      = &quot;x-amzn-transcribe-session-id&quot;     // For retrying a session. Pattern: [a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}
HeaderKeyVocabularyName = &quot;x-amzn-transcribe-vocabulary-name&quot;
HeaderKeyRequestId = &quot;x-amzn-request-id&quot;
)
...
region := &quot;eu-west-1&quot;
cfg, err := external.LoadDefaultAWSConfig(aws.Config{
Region: region,
})
if err != nil {
log.Printf(&quot;could not load default AWS config: %v&quot;, err)
return
}
signer := v4.NewSigner(cfg.Credentials)
transport := &amp;http2.Transport{
TLSClientConfig: &amp;tls.Config{
// allow insecure just for debugging
InsecureSkipVerify: true,
},
}
client := &amp;http.Client{
Transport: transport,
}
signTime := time.Now()
header := http.Header{}
header.Set(HeaderKeyLanguageCode, &quot;en-US&quot;)
header.Set(HeaderKeyMediaEncoding, &quot;pcm&quot;)
header.Set(HeaderKeySampleRate, &quot;16000&quot;)
header.Set(&quot;Content-type&quot;, &quot;application/json&quot;)
// Bi-directional streaming via a pipe.
pr, pw := io.Pipe()
req, err := http.NewRequest(http.MethodPost, &quot;https://transcribestreaming.eu-west-1.amazonaws.com/stream-transcription&quot;, ioutil.NopCloser(pr))
if err != nil {
log.Printf(&quot;err: %+v&quot;, err)
return
}
req.Header = header
_, err = signer.Sign(req, nil, &quot;transcribe&quot;, region, signTime)
if err != nil {
log.Printf(&quot;problem signing headers: %+v&quot;, err)
return
}
// This freezes and ends after 5 minutes with &quot;unexpected EOF&quot;.
res, err := client.Do(req)
...

Problem is that executing the request (client.Do(req)) freezes for five minutes and then ends with the "unexpected EOF" error.

Any ideas what I am doing wrong? Did someone successfully use the new streaming transcribe API without the Java SDK?

EDIT (March 11, 2019):

I tested this again and now it does not time out but immediately returns 200 OK response. There is an "exception" in the response body though: {&quot;Output&quot;:{&quot;__type&quot;:&quot;com.amazon.coral.service#SerializationException&quot;},&quot;Version&quot;:&quot;1.0&quot;}

I tried opening the HTTP2 stream with io.Pipe (like the code above) and also with a JSON body described in the documentation:

{
&quot;AudioStream&quot;: { 
&quot;AudioEvent&quot;: { 
&quot;AudioChunk&quot;: &quot;&quot;
}
}
}

The result was the same.

EDIT (March 13, 2019):

As mentioned by @gpeng, removing the content-type from headers will fix the SerializationException. But then there is an IAM exception and it is needed to add the transcription:StartStreamTranscription permission to your IAM user. That is though nowhere in the AWS IAM console and must be added manually as a custom JSON permission :/

There is also a new/another documentation document here which shows incorrect host and a new content-type (do not use that content-type, the request will return 404 with it).

After removing the content-type, and adding the new permission, now I am getting an exception {&quot;Message&quot;:&quot;A complete signal was sent without the preceding empty frame.&quot;}. Also writing to the pipe blocks forever, so I am stuck again. The messages described in the new documentation are different than in the old one, now finally binary, but I do not understand them. Any ideas how to send such HTTP2 messages in Go?

EDIT (Match 15, 2019):*

If you get HTTP 403 error about signature mismatch, then do not set the transfer-encoding and x-amz-content-sha256 HTTP headers. When I set them, sign the request with AWS SDK's V4 signer, then I receive HTTP 403 The request signature we calculated does not match the signature you provided.

答案1

得分: 4

我联系了AWS支持,他们现在建议在可能的情况下使用Websockets而不是HTTP/2(博客文章在这里)。

如果这符合您的用例,我强烈建议您查看这个新的示例存储库:https://github.com/aws-samples/amazon-transcribe-websocket-static,其中展示了一个基于JS的基于浏览器的解决方案。

我还注意到演示的作者在他的个人Github上有一个express示例:https://github.com/brandonmwest/amazon-transcribe-websocket-express,但我还没有确认这是否有效。

这些示例不是用Python编写的,但我认为使用Websocket客户端而不是HTTP/2会更加顺利(说实话,HTTP/2仍然有点可怕:P)。

英文:

I reached out to AWS support and they now recommend using websockets instead of HTTP/2 when possible (blog post here)

If this fits your usecase I would highly recommend checking out the new example repo at: https://github.com/aws-samples/amazon-transcribe-websocket-static which shows a browser-based solution in JS.

I've also noticed that the author of the demo has an express example on his personal Github at: https://github.com/brandonmwest/amazon-transcribe-websocket-express but I haven't confirmed if this is working.

Appreciate these examples aren't in Python but I think you'll have better luck using the Websocket client as opposed to HTTP/2 (which let's be honest, is still a bit terrifying :P)

答案2

得分: 1

尝试不设置内容类型标头,看看你会得到什么响应。我正在尝试做同样的事情(但是用Ruby),这样可以“修复”SerializationException。虽然我还无法使其正常工作,但现在我有一个新的错误需要考虑 亚马逊转录流式 API 无需 SDK

更新:我现在已经使其正常工作了。我的问题出在签名上。如果同时传递hostauthority标头,它们会用逗号连接,并在服务器端被视为host,当检查签名时,签名永远不匹配。这似乎不是AWS方面的正确行为,但看起来这对你在Go中不会成为问题。

英文:

Try not setting the content type header and see what response you get. I'm trying to do the same thing (but in Ruby) and that 'fixed' the SerializationException. Still can't get it to work but I've now got a new error to think about 亚马逊转录流式 API 无需 SDK

UPDATE: I have got it working now. My issue was with the signature. If both host and authority headers are passed they are joined with , and treated as host on the server side when the signature is checked so the signatures never match. That doesn't seem like correct behaviour on the AWS side but it doesn't look like it's going to be an issue for you in Go.

答案3

得分: 0

我还在处理与Node.js相关的问题。关于文档中的内容,有一点不太清楚:在某个地方中,它说Content-Type不应该是application/json,但在其他一些地方中,它看起来好像负载应该编码为application/vnd.amazon.eventstream。看起来负载应该以二进制格式进行精心格式化,而不是作为JSON对象,具体如下所示:

Amazon Transcribe使用一种称为事件流编码的格式进行流式转录。该格式使用头信息对二进制数据进行编码,并描述每个事件的内容。您可以将此信息用于调用Amazon Transcribe端点而不使用Amazon Transcribe SDK的应用程序。
Amazon Transcribe使用HTTP/2协议进行流式转录。流式请求的关键组件包括:

  • 一个头帧。其中包含请求的HTTP头以及Amazon Transcribe用作种子签名的授权头中的签名,用于对接下来的数据帧进行签名。

  • 一个或多个事件流编码的消息帧。该帧包含元数据和原始音频字节。

  • 一个结束帧。这是一个带有空主体的事件流编码的签名消息。

有一个示例函数展示了如何使用Java实现所有这些,这可能会对如何进行编码提供一些指导。

英文:

I'm still fighting this thing with Node.js as well. What is not clear about the docs is that in one place it says that the Content-Type should not be application/json, but in some other place, it makes it look like that payload should be encoded as application/vnd.amazon.eventstream. It looks like the payload should be carefully formatted in a binary format instead of a JSON object as follows:

> Amazon Transcribe uses a format called event stream encoding for streaming transcription. This format encoded binary data with header information that describes the contents of each event. You can use this information for applications that call the Amazon Transcribe endpoint without using the Amazon Transcribe SDK.
> Amazon Transcribe uses the HTTP/2 protocol for streaming transcriptions. The key components for a streaming request are:
>
> + A header frame. This contains the HTTP headers for the request, and a signature in the authorization header that Amazon Transcribe uses as a seed signature to sign the following data frames.
>
> + One or message frames in event stream encoding. The frame contains metadata and the raw audio bytes.
>
> + An end frame. This is a signed message in event stream encoding with an empty body.

There is a sample function that shows how to implement all of that using Java which might shed some light in how this encoding is to be done.

答案4

得分: 0

我在使用AWS Transcribe服务的WebSocket API在Node.js中有类似的需求。由于官方包目前还不支持这个功能,所以我自己写了一个名为AWS-transcribe的包,可以在这里找到。希望对你有帮助。

它提供了一个围绕WebSocket的流接口,并可以像下面的示例一样使用:

import { AwsTranscribe, StreamingClient } from "aws-transcribe"

const client = new AwsTranscribe({
    // 如果没有提供这些信息,将从环境中获取
    accessKeyId: "ACCESS KEY HERE",
    secretAccessKey: "SECRET KEY HERE",
})

const transcribeStream = client
    .createStreamingClient({
        region: "eu-west-1",
        sampleRate,
        languageCode: "en-US",
    })
    // 枚举返回流将发出的事件名称
    .on(StreamingClient.EVENTS.OPEN, () => console.log(`transcribe connection opened`))
    .on(StreamingClient.EVENTS.ERROR, console.error)
    .on(StreamingClient.EVENTS.CLOSE, () => console.log(`transcribe connection closed`))
    .on(StreamingClient.EVENTS.DATA, (data) => {
        const results = data.Transcript.Results

        if (!results || results.length === 0) {
            return
        }

        const result = results[0]
        const final = !result.IsPartial
        const prefix = final ? "recognized" : "recognizing"
        const text = result.Alternatives[0].Transcript
        console.log(`${prefix} text: ${text}`)
    })

someStream.pipe(transcribeStream)

以上是翻译好的内容。

英文:

I had a similar requirement for using the AWS transcribe service with their WebSocket API in node js. Seeing as there was no support for this in the official package as of yet, I have gone ahead and written a package that is called AWS-transcribe and can be found here. I hope that helps.

It provides a stream interface around the WebSocket, and can be used like the below example

import { AwsTranscribe, StreamingClient } from &quot;aws-transcribe&quot;
const client = new AwsTranscribe({
// if these aren&#39;t provided, they will be taken from the environment
accessKeyId: &quot;ACCESS KEY HERE&quot;,
secretAccessKey: &quot;SECRET KEY HERE&quot;,
})
const transcribeStream = client
.createStreamingClient({
region: &quot;eu-west-1&quot;,
sampleRate,
languageCode: &quot;en-US&quot;,
})
// enums for returning the event names which the stream will emit
.on(StreamingClient.EVENTS.OPEN, () =&gt; console.log(`transcribe connection opened`))
.on(StreamingClient.EVENTS.ERROR, console.error)
.on(StreamingClient.EVENTS.CLOSE, () =&gt; console.log(`transcribe connection closed`))
.on(StreamingClient.EVENTS.DATA, (data) =&gt; {
const results = data.Transcript.Results
if (!results || results.length === 0) {
return
}
const result = results[0]
const final = !result.IsPartial
const prefix = final ? &quot;recognized&quot; : &quot;recognizing&quot;
const text = result.Alternatives[0].Transcript
console.log(`${prefix} text: ${text}`)
})
someStream.pipe(transcribeStream)

huangapple
  • 本文由 发表于 2018年12月12日 21:09:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/53743785.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定